Surface-bound, double-stranded DNA protein arrays

ABSTRACT

The invention provides a synthetic array of surface-bound, bimolecular, double-stranded nucleic acid molecules, the array comprising a solid support and a plurality of bimolecular double-stranded nucleic acid molecule members, a member comprising a first nucleic acid strand linked to the solid support and a second nucleic acid strand which is substantially complementary to the first strand and complexed to the first strand by Watson-Crick base pairing, wherein for at least a portion of the members, each member comprises a recognition site within a nucleic acid sequence for a protein, wherein a recognition site within a nucleic acid sequence for a protein of a first member is different from a recognition site within a nucleic acid sequence for a protein of a second member and wherein a protein is bound to a member thereof.

FIELD OF INVENTION

[0001] The invention relates to nucleic acid protein arrays.

BACKGROUND OF THE INVENTION

[0002] This application claims the benefit of U.S. ProvisionalApplication No. 60/061,604, filed Oct. 10, 1997.

[0003] Compact arrays or libraries of surface-bound, double-strandedoligonucleotides are of use in rapid, high-throughput screening ofproteins to identify those that bind, or otherwise interact with, short,double-stranded DNA sequence motifs. Of particular interest aretrans-regulatory factors that control gene transcription. Ideally, suchan oligonucleotide array is bound to the surface of a solid supportmatrix that is of a size that enables laboratory manipulations, e.g. anincubation of a candidate protein with the nucleic acid sequencesthereon, and that is itself inert to chemical interactions withexperimental proteins, buffers and/or other components. In addition, itis desirable that the absolute number of unique nucleic acid sequencesin the array be maximized, since methods of high-throughput screeningare used in the attempt to minimize repetition of steps that arelabor-intensive or otherwise costly.

[0004] A high-density, double-stranded DNA array complexed to a solidmatrix is described by Lockhart (U.S. Pat. No. 5,556,752); however, theDNA molecules therein disclosed are produced as unimolecular products ofchemical synthesis. As synthesized, each member of the array containsregions of self-complementarity separated by a spacer (i.e. asingle-strand loop), such that these regions hybridize to each other inorder to produce a double-helical region. Further, it is required thatthose regions of complementary nucleic acid sequences that musthybridize in order to form the double-helical structure are physicallyattached to each other by a linker subunit.

SUMMARY OF THE INVENTION

[0005] The invention provides a synthetic array of surface-bound,bimolecular, double-stranded nucleic acid molecules, the arraycomprising a solid support and a plurality of bimoleculardouble-stranded nucleic acid molecule members, a member comprising afirst nucleic acid strand linked to the solid support and a secondnucleic acid strand which is substantially complementary to the firststrand and complexed to the first strand by Watson-Crick base pairing,wherein for at least a portion of the members, each member comprises arecognition site within a nucleic acid sequence for a protein, wherein arecognition site within a nucleic acid sequence for a protein of a firstmember is different from a recognition site within a nucleic acidsequence for a protein of a second member and wherein a protein is boundto a member thereof.

[0006] The term “synthetic”, as used herein, is defined as that which isproduced by in vitro chemical or enzymatic synthesis. The syntheticarrays of the present invention may be contrasted with natural nucleicacid molecules such as viral or plasmid vectors, for instance, which maybe propagated in bacterial, yeast, or other living hosts.

[0007] As used herein, the term “nucleic acid” is defined to encompassDNA and RNA or both synthetic and natural origin. The nucleic acid mayexist as single- or double-stranded DNA or RNA, an RNA/DNA heteroduplexor an RNA/DNA copolymer, wherein the term “copolymer” refers to a singlenucleic acid strand that comprises both ribonucleotides anddeoxyribonucleotides.

[0008] As used herein, the term “bimolecular” refers to the fact thatthe 5′ end of the first strand and 3′ end of the second strand are notlinked via a covalent bond, and thus do not form a continuous singlestrand. As used herein in this context, “covalent bond” is defined asmeaning a bond that forms, directly or via a spacer comprising nucleicacid or another material, a continuous strand that comprises the 5′ endof the first strand and the 3′ end of the second strand, and thusincludes a 3′/5′ phosphate bond as occurs naturally in a single-strandednucleic acid. This definition does not encompass intermolecularcrosslinking of the first and second strands.

[0009] When used herein in this context, the term “double-stranded”refers to a pair of nucleic acid molecules, as defined above, that existin a hydrogen-bonded, helical array typically associated with DNA, andthat under these umbrella terms are included those pairedoligonucleotides that are essentially double-stranded, meaning thosethat contain short regions of mismatch, such as a mono-, di- ortri-nucleotide, resulting from design or error either in chemicalsynthesis of the oligonucleotide priming site on the first nucleic acidstrand or in enzymatic synthesis of the second nucleic acid strand; itis contemplated that at least a portion of the members of the array havea second nucleic acid strand which is substantially complementary to-and base paired with the first strand along the entire length of thefirst strand.

[0010] As used herein, the terms “complementary” and “substantiallycomplementary” refer to the hybridization or base pairing betweennucleotides or nucleic acids, such as, for instance, between the twostrands of a double-stranded DNA molecule or between an oligonucleotideprimer and a primer binding site on a single stranded nucleic acid to besequenced or amplified. Complementary nucleotides are, generally, A andT (or A and U), or C and G. Typically, sequences which are complementarywill hybridize to each other under stringent conditions. Stringenthybridization conditions will typically include salt concentrations ofless than about 1M, more usually less than about 500 mM, and preferablyless than about 200 mM. Alternatively, stringent hybridizationconditions typically include at least 10% formamide, preferably 20% andmore preferably 40%. Hybridization temperatures can be as low as 5° C.,but are typically greater than 22° C., more typically greater than about30° C., and preferably in excess of about 37° C. Longer fragments mayrequire higher hybridization temperatures for specific hybridization,while those that are rich in dA and dT may require lower temperatures.Two single-stranded RNA or DNA molecules are said to be substantiallycomplementary when the nucleotides of one strand, optimally aligned andcompared and with appropriate nucleotide insertions or deletions, pairwith at least about 80% of the nucleotides of the other strand, usuallyat least about 90% to 95%, and more preferably from about 98 to 100%.Sequences that are substantially complementary may hybridize understringent conditions; however, it is usually necessary to raise theconcentration of salt, or lower the concentration of formamide or thehybridization temperature.

[0011] As used herein in reference to nucleic acid members of an array,the term “portion” refers to at least two members of an array.Preferably, a portion refers to a number of individual members of anarray, such as at least 60%, 80%, 90% and 95-100% of such members.

[0012] As used herein, the terms “recognition site for a protein” and“recognition site within a nucleic acid sequence for a protein” refersto a nucleic acid sequence which is recognized and/or bound by aprotein.

[0013] As used herein with regard to recognition sites within a nucleicacid sequence for a protein, the term “different” refers to two or morenucleic acid sequences which are recognized and/or bound by a protein orproteins, which recognition sites within a nucleic acid sequence for aprotein differ in the identity of at least one nucleotide.

[0014] As used herein, the term “array” is defined to mean aheterogeneous pool of nucleic acid molecules that is affixed to a solidsupport in a spatially-ordered manner, such as a Cartesian distribution(in other words, arranged at defined points along the x- and y axes of agrid or specific ‘clock positions’ within- or degrees or radii from thecenter of a radial pattern) of nucleic acid molecules over the support,that permits identification of individual features during the course ofexperimental manipulation.

[0015] As used herein, the term “feature” refers to each nucleic acidsequence occupying a discrete physical location on the array; if a givensequence is represented at more than one such site, each site isclassified as a feature. A feature comprises one or a plurality ofindividual, double-stranded, bimolecular nucleic acid molecule members;within a given feature, every such member represents the same sequence.

[0016] According to the invention, the array may have virtually anynumber of different features.

[0017] In preferred embodiments, the array comprises from 2 up to 100features, more preferably from 100 up to 10,000 features and highlypreferably from 10,000 up to 1,000,000 features, preferably on a solidsupport. In preferred embodiments, the array will have a density of morethan 100 features at known locations per cm², preferably more than 1,000per cm², more preferably more than 10,000 per cm².

[0018] According to the methods disclosed herein, a “solid support” (or,simply, “support”) is defined as a material having a rigid or semi-rigidsurface to which nucleic acid molecules may be attached or upon whichthey may be synthesized.

[0019] It is contemplated that attached to the solid support is aspacer. The spacer molecule is preferably of sufficient length to permitthe double-stranded oligonucleotide in the completed member of the arrayto interact freely with molecules exposed to the array. The spacermolecule, which may comprise as little as a covalent bond length, istypically 6-50 atoms long to provide sufficient exposure for theattached double-stranded DNA molecule. The spacer is comprised of asurface attaching portion and a longer chain portion.

[0020] It is preferred that the 3′ end of the first strand is linked tothe support.

[0021] It is additionally preferred that the 5′ end of the first strandand the 3′ end of the second strand are not linked via a covalent bond.

[0022] Preferably, the 5′ end of the second strand is not linked to thesupport.

[0023] It is preferred that the recognition site within a nucleic acidsequence for a protein is selected from the group that includesnaturally-occurring recognition sites within a nucleic acid sequence fora protein or proteins, synthetic variants of naturally-occurringrecognition sites within a nucleic acid sequence for a protein orproteins and randomized nucleic acid sequences.

[0024] As used herein in reference to recognition sites within a nucleicacid sequence for a protein or proteins, the term “naturally-occurring”refers to such sequences isolated from an organism, wherein thosesequences are native to that species or strain of organism and are notthe products of genetic engineering, e.g. synthetic sequences, whethertransiently transfected or stably incorporated into the genome of atransgenic or transiently-transfected organism or one or more of itsancestor organisms.

[0025] As used herein, the term “allelic variant” refers to anaturally-occuring nucleic acid sequence which is present in a subset ofindividuals (2-98%) of a population. Such a sequence may functionproperly (e.g. be recognized by the correct protein) or may be poorly-or non-functional. The term “poorly-functional” refers to a recognitionsite within a nucleic acid sequence for a protein which, for example,has lowered affinity for its corresponding protein or is recognized andbound by the wrong protein. In this context, a “non-functional”recognition site within a nucleic acid sequence for a protein would beexpected to bind background levels of (essentially no) protein. Unlessfound in a majority of individuals in a population, the sequence of anallelic variant differs in at least one position relative to that of aconsensus sequence, as defined below.

[0026] As used herein, the term “mutant variant” refers to anaturally-occurring nucleic acid sequence which occurs at a lowfrequency (less than 2%) in a population. As is true of an allelicvariant, a mutant variant may function properly, poorly or not at all.

[0027] As used herein, the term “synthetic variant” refers to a nucleicacid sequence in which the identity of at least one nucleotide has beenaltered in vitro, such that it represents no naturally-occuring variantof the sequence upon which is is based. A synthetic variant may functionproperly, poorly or not at all.

[0028] As used herein with regard to individual nucleic acid sequences,the term “randomized” refers to in vitro-synthesized sequences in whichany nucleotide or ribonucleotide can be present at one, more than one orall positions; therefore, for such positions as are randomized, thesequence of the finished molecule is not pre-determined, but is left tochance.

[0029] As used herein with regard to an array of the invention, the term“randomized” refers to an array which is constructed such that, for asequence of a recognition site within a nucleic acid sequence of aprotein of a selected length (e.g. a hexamer), each possible nucleotidecombination is comprised by a corresponding feature thereof. In order torealize a complete set of such nucleotide sequence permutations, it isnecessary to specify fully the sequence of each feature during synthesisof the array; therefore, while such an array may be referred to as an“array of randomized 6-mers” the design of the array is entirelynon-random.

[0030] One or more recognition sites within a nucleic acid sequence fora protein or proteins may be present in a given member nucleic acid ofan array, wherein “one or more” refers to one, two, three, four, fiveand even up to 10-20 sites.

[0031] In a preferred embodiment, the recognition site within a nucleicacid sequence for a protein comprises two half-sites, wherein either isrecognized by a different protein than is the other.

[0032] As used herein, the term “half-site” refers to a nucleic acidsequence which is recognized and bound by a targeting amino acidsequence present on one protein subunit of a dimeric protein complex.Neither subunit of the dimeric protein complex will bind its cognatehalf-site alone (i.e., unless dimerized to the other); therefore, eitherboth half-sites are occupied by protein, or neither is. Both half sitesof a recognition site within a nucleic acid sequence for a protein maybe identical, whether arranged head-to-tail or as a palindrome(head-to-head or tail-to-tail); if in the latter configuration, thesequence of a recognition site within a nucleic acid sequence of aprotein is said to have “dyad symmetry”. Typically, a recognition sitewithin a nucleic acid sequence for a protein bound by a proteinhomodimer comprises two identical half-sites. Alternatively, the twohalf-sites comprised by a recognition site within a nucleic acidsequence for a protein may be unlike in sequence; it is usually truethat dissimilar half-sites are bound by different targeting amino acidsequences, as would be found on the two subunits of a proteinheterodimer. Depending on their orientation relative to one another,recognition sites within a nucleic acid sequence for a proteincomprising non-identical, but similar, half-sites may also be said tohave dyad symmetry.

[0033] As used herein, the term “targeting amino acid sequence” refersto an amino acid sequence present on a protein which sequence recognizesa recognition site within a nucleic acid sequence for a protein on anucleic acid molecule. A protein may comprise one or a plurality (two ormore) of targeting amino acid sequences and bind one or a plurality ofdifferent recognition sites within a nucleic acid sequence for a proteinor proteins. A given targeting nucleic acid sequence may recognize andbind one recognition site within a nucleic acid sequence for a proteinor different recognition sites within a nucleic acid sequence for aprotein or proteins on a nucleic acid molecule. “Different targetingamino acid sequences”, herein defined as those which differ by at leastone amino acid, may recognize and bind the same recognition site withina nucleic acid sequence for a protein or proteins, different recognitionsites within a nucleic acid sequence or sequences for a protein orproteins, or two partially-overlapping sets of different recognitionsites within a nucleic acid sequence for a protein or proteins on anucleic acid molecule.

[0034] It is contemplated that different targeting amino acid sequences,as defined above, may exist on a single polypeptide molecule; typically,however, different targeting amino acid sequences are found on differentpolypeptide molecules that are of use in the invention. If a polypeptideshould possess two or more targeting amino acid sequences, and thesetargeting amino acid sequences differ in the sequence of at least oneamino acid (whether or not they differ in binding-site specificity),that single polypeptide molecule comprises more than one differentprotein, as defined herein.

[0035] The term “half-site” is not applicable to a recognition sitewithin a nucleic acid sequence for a protein (whether in whole or inpart) which is recognized by a protein that binds nucleic acids alone,rather than in a di- or multimeric complex, regardless of the presenceof any internal symmetry or repetition of sequence in such a recognitionsite within a nucleic acid sequence for a protein.

[0036] As used herein, the term “different protein” refers to two ormore proteins which differ in the identity of at least one amino acidwithin a targeting amino acid sequence.

[0037] It is contemplated that different recognition sites within anucleic acid sequence for a protein on a nucleic acid molecule ormolecules may be recognized and bound by the same targeting amino acidsequence, by different targeting amino acid sequences, or by twopartially-overlapping sets of different targeting amino acid sequencesof a protein or proteins.

[0038] It is preferred that the protein which is bound to a memberthereof comprises a detectable label.

[0039] Preferably, the protein is a chimeric protein.

[0040] As used herein, the term “chimeric” refers to a protein whichcomprises fused sequences of two or more polypeptides that are,themselves, different in amino acid sequence and are typically encodedby different genes. The term “different genes” may refer to allelic ofmutant variants of a gene present at a single genetic locus; preferably,it refers to two or more genes which are found at a corresponding numberof genetic loci, and which may be selected from one or more individualorganisms or species of organism. A chimeric protein may beadvantageously produced by the in-frame fusion and subsequent expressionof nucleic acid sequences encoding the component amino acid sequences.Such amino acid sequences may each comprise an entire protein;alternatively, one or more sequence comprised by a chimeric protein maybe a fragment of a protein. Typically, each segment is sufficient inscope to retain its native biological activity (e.g. a targeting aminoacid sequence which binds a recognition site within a nucleic acidsequence for a protein on a nucleic acid molecule in the context of itsnative protein will do so in the context of the chimera).

[0041] It contemplated that a chimeric (or “fusion”) protein accordingto the invention comprises a protein which binds a recognition sitewithin a nucleic acid sequence for a protein, fused to a second proteincomponent comprising any one of a receptor, an enzyme, a candidateenzyme domain such as a kinase or a protease domain, a candidateprotein:protein dimerization domain, a candidate ligand binding domain,or a substrate for a protein-directed enzymatic reaction. In thiscontext, a “protein” is either a whole protein or a protein fragmentwhich retains its ability to recognize- and bind specifically to arecognition site within a nucleic acid sequence for a protein on anucleic acid molecule to which site the native, whole protein binds.

[0042] As used herein, the term “domain” is a portion of a proteinmolecule which is sufficient for the performance of a given function,whether in the presence or absence of other sequences of the protein. Itis contemplated that a domain is encoded by an uninterrupted amino acidsequence, such that it may be physically cleaved whole away from otheramino acid sequence elements and such that it will fold properly withoutthe influence of neighboring sequences.

[0043] It is preferred that the chimeric protein comprises a DNA-bindingdomain fused in-frame with a protein:protein dimerization domain.

[0044] As used herein with regard to protein domains, the term“DNA-binding” refers to a function of the domain, which is to bind to arecognition site within a nucleic acid sequence for a protein on a DNAmolecule.

[0045] In another preferred embodiment, the chimeric protein comprises aDNA-binding domain fused in-frame to Green Fluorescent Protein.

[0046] Preferably, the solid support is a silica support.

[0047] It is preferred that the first strand is produced by chemicalsynthesis and the second strand is produced by enzymatic synthesis.

[0048] Preferably, the first strand is used as the template on which thesecond strand is enzymatically produced.

[0049] It is preferred that the first strand of each member contains atits 3′ end a binding site for an oligonucleotide primer which is used toprime enzymatic synthesis of the second strand, and at its 5′ end avariable sequence.

[0050] The term “oligonucleotide primer”, as used herein, refers to asingle-stranded DNA or RNA molecule that is hybridized to a nucleic acidtemplate to prime enzymatic synthesis of a second nucleic acid strand.

[0051] Preferably, enzymatic synthesis is performed using an enzyme.

[0052] In a preferred embodiment, the oligonucleotide primer is between10 and 30 nucleotides in length.

[0053] It is preferred that the first strand comprises DNA.

[0054] It is additionally preferred that the second strand comprisesDNA.

[0055] Preferably, the first and second strands each comprise from 16 to60 monomers selected from the group that includes ribonucleotides anddeoxyribonucleotides.

[0056] Use of the term “monomer” is made to indicate any of the set ofmolecules which can be joined together to form an oligomer or polymer.The set of monomers useful in the present invention includes, but is notrestricted to, for the example of oligonucleotide synthesis, the set ofnucleotides consisting of adenine, thymine, cytosine, guanine, anduridine (A, T, C, G, and U, respectively) and synthetic analogs thereof.As used herein, “monomer” refers to any member of a basis set forsynthesis of an oligomer. Different basis sets of monomers may be usedat successive steps in the synthesis of a polymer.

[0057] Preferably, at least a portion of the plurality have a secondnucleic acid strand that is substantially complementary to- andbase-paired with the first strand along the entire length of the firststrand.

[0058] As used herein in reference to a plurality of nucleic acidmembers of an array, the term “portion” refers to at least two membersof an array. Preferably, a portion refers to a number of individualmembers of an array, such as at least 60%, 80%, 90% and 95-100% of suchmembers.

[0059] Another aspect of the present invention is a method for theconstruction of a synthetic array of surface-bound, bimolecular,double-stranded nucleic acid molecules, comprising the steps ofproviding an array of first nucleic acid strands linked to a solidsupport, hybridizing to the first strands an oligonucleotide primer thatis substantially complementary to a sequence comprised by a firststrand, performing enzymatic synthesis of a second nucleic acid strandthat is complementary to a first strand so as to permit Watson-Crickbase pairing and so as to form an array comprising a plurality ofbimolecular, double-stranded nucleic acid molecule members, wherein forat least a portion of the members, each member comprises a recognitionsite within a nucleic acid sequence for a protein and wherein arecognition site within a nucleic acid sequence for a protein of a firstmember is different from a recognition site within a nucleic acidsequence for a protein of a second member, and incubating the array witha protein sample comprising a protein under conditions that permitspecific binding of the protein to a member of the array, such that aprotein becomes bound to a recognition site within a nucleic acidsequence for a protein on a member to form a nucleic acid protein array.

[0060] Preferably, the 3′ end of the first strand is linked to thesupport.

[0061] It is preferred that the 5′ end of the first strand and the 3′end of the second strand are not linked via a covalent bond.

[0062] It is additionally preferred that the 5′ end of the second strandis not linked to the solid support.

[0063] Preferably, the recognition site within a nucleic acid sequencefor a protein is selected from the group that includesnaturally-occurring recognition sites within a nucleic acid sequence fora protein or proteins, synthetic variants of naturally-occurringrecognition sites within a nucleic acid sequence for a protein orproteins and randomized nucleic acid sequences.

[0064] Preferably, the recognition site within a nucleic acid sequencefor a protein comprises two half-sites, wherein either is recognized bya different protein than is the other.

[0065] It is preferred that the protein which is bound to a member ofthe array comprises a detectable label.

[0066] It is also preferred that the protein is a chimeric protein.

[0067] In a particularly preferred embodiment, the chimeric proteincomprises a DNA-binding domain fused in-frame with a protein:proteindimerization domain.

[0068] It is also particularly preferred that the chimeric proteincomprises a DNA-binding domain fused in-frame to Green FluorescentProtein.

[0069] Preferably, the solid support is a silica support.

[0070] It is preferred that the first strand of each member contains atits 3′ end a binding site for an oligonucleotide primer which is used toprime enzymatic synthesis of the second, and at its 5′ end a variablesequence, wherein the binding site is present in each member of thearray.

[0071] Preferably, enzymatic synthesis is performed using an enzyme.

[0072] In a preferred embodiment, the oligonucleotide primer of isbetween 10 and 30 nucleotides in length.

[0073] It is preferred that the first strand comprises DNA.

[0074] It is additionally preferred that the second strand comprisesDNA.

[0075] Preferably, the first and second strands each comprise from 16 to60 monomers selected from the group that includes ribonucleotides anddeoxyribonucleotides.

[0076] In a highly preferred embodiment, the solid support is a silicasupport and the first and second strands each comprise from 16 to 60monomers selected from the group that includes ribonucleotides anddeoxyribonucleotides.

[0077] Preferably, the protein sample comprises a candidate inhibitor ofbinding of the protein to a recognition site within a nucleic acidsequence for a protein on a member of the array.

[0078] It is preferred that the protein sample comprises a candidateinhibitor of binding of the protein to a second protein.

[0079] The invention also encompasses a method of determining aconsensus nucleic acid sequence for a recognition site within a nucleicacid sequence in a nucleic acid molecule for a protein comprising thesteps of providing a nucleic acid protein array comprising a solidsupport and a plurality of bimolecular double-stranded nucleic acidmolecule members, a member comprising a first nucleic acid strand linkedto the solid support and a second nucleic acid strand which issubstantially complementary to the first strand and complexed to thefirst strand by Watson-Crick base pairing, wherein for at least aportion of the members, each member comprises a recognition site withina nucleic acid sequence for a protein, wherein a recognition site withina nucleic acid sequence for a protein of a first member is differentfrom a recognition site within a nucleic acid sequence for a protein ofa second member and wherein a protein comprising a detectable label isbound to a member thereof, and performing a detection step to detect thepresence of the label on a feature of the array, wherein nucleotidesthat are shared among the recognition sites within a nucleic acidsequence for a protein present on features on which the label isdetected form a consensus nucleic acid sequence for a recognition sitewithin a nucleic acid sequence for a protein specific for the protein.

[0080] As defined herein in reference to recognition sites within anucleic acid sequence for a protein or proteins, the term “consensus”refers to a common nucleic acid sequence wherein the nucleotide at eachposition thereof represents that which is most frequently found inrecognition sites within a nucleic acid sequence for a selected proteinor group of proteins. A consensus sequence may be identical to anaturally-occurring recognition site within a nucleic acid sequence fora protein; alternatively, it may have a sequence which does not occurnaturally in the genome of an organism.

[0081] As used herein, the term “shared” refers to a nucleotide orribonucleotide which is present in all, or substantially all sequencescompared, wherein substantial sharing is defined as the presence in 75%or more of said sequences of a given nucleotide or ribonucleotide at aspecified position.

[0082] The invention additionally provides a method of identifying for afirst protein which binds a nucleic acid as half of a protein:proteinheterodimer complex one or a plurality of candidate second proteins withwhich it might dimerize and bind a nucleic acid molecule in vivo,comprising the steps of providing a nucleic acid array comprising asolid support and a plurality of bimolecular double-stranded nucleicacid molecule members, a member comprising a first nucleic acid strandlinked to the solid support and a second nucleic acid strand which issubstantially complementary to the first strand and complexed to thefirst strand by Watson-Crick base pairing, wherein for at least aportion of the members, each member comprises a recognition site withina nucleic acid sequence for a protein, wherein a recognition site withina nucleic acid sequence for a protein of a first member is differentfrom a recognition site within a nucleic acid sequence for a protein ofa second member, wherein a binding site comprises two half-sites andwherein either of the half-sites of a recognition site within a nucleicacid sequence for a protein is recognized by a different protein than isthe other, incubating the array with a protein sample comprising a firstprotein which recognizes a first half-site of a recognition site withina nucleic acid sequence within a nucleic acid sequence for a protein andone or a plurality of candidate second proteins under conditions whichpermit heterodimerization of a first and candidate second protein andbinding of a protein:protein heterodimer to a recognition site within anucleic acid sequence for a protein, recovering a protein:proteinheterodimer complex from a member of the array under conditions wherebythe first protein and candidate second protein dissociate from oneanother, and identifying the candidate second protein, wherein eachcandidate second protein so identified represents a protein with whichthe first protein may dimerize in vivo.

[0083] Preferably, identifying of the candidate second protein comprisessequencing thereof.

[0084] In another preferred embodiment, identifying of the candidatesecond protein comprises binding of the candidate second protein to anantibody which is specific therefor.

[0085] It is preferred that the first protein comprises a detectablelabel.

[0086] It is additionally preferred that the method further comprisesthe step of performing a detection step to detect the presence of thelabel on a feature of the array, wherein the recognition site within anucleic acid sequence for a protein present on a feature upon which thelabel is detected represents a candidate recognition site within anucleic acid sequence for a protein which the heterodimer may bind invivo.

[0087] The invention also provides a method of identifying candidatemembers of a set of co-regulated genes, comprising the steps ofproviding a nucleic acid protein array comprising a solid support and aplurality of bimolecular double-stranded nucleic acid molecule members,a member comprising a first nucleic acid strand linked to the solidsupport and a second nucleic acid strand which is substantiallycomplementary to the first strand and complexed to the first strand byWatson-Crick base pairing, wherein for at least a portion of themembers, each member comprises a recognition site within a nucleic acidsequence for a protein, wherein a recognition site within a nucleic acidsequence for a protein of a first member is different from a recognitionsite within a nucleic acid sequence for a protein of a second member andwherein a protein comprising a detectable label is bound to a memberthereof, and performing a detection step to detect the presence of thelabel on a feature of the array, wherein a gene having among itsregulatory sequences one or more of the recognition sites within anucleic acid sequence for a protein present on a feature on which thelabel is detected is characterized as a candidate member of a set ofco-regulated genes that are regulated by the protein.

[0088] A “set of co-regulated genes” refers to a number of genes, in therange of about 2 to about 30 genes, that exhibit a given response (interms of gene expression) to an external stimulus or a given response toa mutation in a specific gene. An example of the latter is where amutation in the coding region of gene X results in a change inexpression levels of genes A-Z. The term “co-regulated set of genes”additionally encompasses genes which are normally under the control of acommon trans-regulatory factor, such as a protein. The upper limit onthe number in a set of co-regulated genes (i.e., “positives” orup-regulated genes; or “negatives” or down-regulated genes) may be onthe order of several thousand.

[0089] Another aspect of the present invention is a method of assaying acandidate inhibitor of protein/nucleic acid interactions, comprising thesteps of providing a nucleic acid array comprising a solid support and aplurality of bimolecular double-stranded nucleic acid molecule members,a member comprising a first nucleic acid strand linked to the solidsupport and a second nucleic acid strand which is substantiallycomplementary to the first strand and complexed to the first strand byWatson-Crick base pairing, wherein for at least a portion of themembers, each member comprises a recognition site within a nucleic acidsequence for a protein, wherein a recognition site within a nucleic acidsequence for a protein of a first member is different from a recognitionsite within a nucleic acid sequence for a protein of a second member,incubating the array with a protein sample comprising a proteincomprising a detectable label and a candidate inhibitor of binding ofthe protein to a recognition site within a nucleic acid sequence for aprotein on a member of the array, under conditions which normally permitbinding of the protein to that member, and performing a detection stepto detect the presence of the label on the member, wherein the presenceof the label on the member corresponds with binding of the protein tothe member and wherein the negation of- or reduction in binding of theprotein to the member is indicative of efficacy of the candidateinhibitor of protein:nucleic acid interactions in inhibiting binding ofthe protein to the recognition site within a nucleic acid sequence for aprotein.

[0090] Such protein:nucleic interactions include, but are not limitedto, recognition of cis-regulatory elements by transcription factors,which may include receptors or polymerase subunits, binding of nucleicacid molecules by structural proteins, such as histones or cytoskeletalcomponents, and recognition of a nucleic acid molecule by restriction-or other endonucleases, exonucleases and nucleic acid modificationenzymes (such as methylases, ligases, phospatases, isomerases,transposases or other recombinases, glycosylases and kinases).

[0091] The final aspect of the present invention is a method of assayinga candidate inhibitor of a protein/protein interaction, comprising thesteps of providing a nucleic acid array comprising a solid support and aplurality of bimolecular double-stranded nucleic acid molecule members,a member comprising a first nucleic acid strand linked to the solidsupport and a second nucleic acid strand which is substantiallycomplementary to the first strand and complexed to the first strand byWatson-Crick base pairing, wherein for at least a portion of themembers, each member comprises a recognition site within a nucleic acidsequence for a protein, wherein a recognition site within a nucleic acidsequence for a protein of a first member is different from a recognitionsite within a nucleic acid sequence for a protein of a second member,incubating the array with a protein sample comprising a first proteincomprising a detectable label, wherein binding of the first protein to arecognition site within a nucleic acid sequence for a protein on amember of the array is dependent upon an interaction between the firstprotein and a second protein and wherein the protein sample furthercomprises the second protein and a candidate inhibitor of theinteraction, under conditions which normally permit the interaction, andperforming a detection step to detect the presence of the label on amember of the array, wherein the presence of the label on a membercorresponds with binding of the protein to that member and wherein thenegation of- or reduction in binding of the protein to the member isindicative of efficacy of the candidate inhibitor in inhibiting theinteraction between the first protein and the second protein.

[0092] Such protein:protein interactions include, but are not limitedto, ligand/receptor interactions, enzyme/substrate interactions,interactions between subunits of a nucleic acid polymerase, andinteractions between molecules of homo- or heterodimeric or -multimericcomplexes.

[0093] The utilization of bimolecular, double-stranded, nucleic acidarrays comprising recognition sites within a nucleic acid sequence for aprotein or proteins or that of nucleic acid/protein arrays according tothe invention provides an improvement over prior art methods in thatwhile the first strand of the DNA duplex is chemically-synthesized onthe support matrix, the second strand is enzymatically produced usingthe first strand as a template. While the error rate in production ofthe first strand remains the same, increased fidelity of second strandsynthesis is expected to result in a higher percentage of points on thematrix surface that are filled by hybridized DNA duplex molecules thatcan serve as targets for protein binding- or other assays. In addition,oligonucleotide priming of second nucleic acid strand synthesis obviatesthe need for covalent linkage of complementary regions, with the effectof reducing extraneous sequence or non-nucleic acid material from thearray, as well as eliminating steps of designing and synthesizing such alinker.

[0094] Further features and advantages of the invention will become morefully apparent in the following description of the embodiments anddrawings thereof, and from the claims.

BRIEF DESCRIPTION OF THE DRAWINGS

[0095]FIG. 1 presents a schematic summary of light-directed DNAsynthesis.

[0096]FIG. 2 presents a photomicrograph of a fluorescently-labeled arrayof bimolecular, double-stranded DNA molecules on a silica chip.

[0097]FIG. 3 presents confocal argon laser scanning to detectfluorescently-labeled, surface-bound nucleic acid molecules.

[0098]FIG. 4 presents RsaI digestion of a fluorescently-labeled array ofbimolecular, double-stranded DNA molecules on a silica chip.

[0099]FIG. 5 presents binding of Green Fluorescent Protein to an arrayof bimolecular, double-stranded DNA molecules on a silica chip, andconfocal argon laser scanning to detect the bound protein.

DESCRIPTION OF THE INVENTION

[0100] Doubled-Stranded Protein Arrays According to the Invention

[0101] The invention is based on double-stranded nucleic acid moleculeprotein arrays, wherein at least two double-stranded nucleic acidmolecules contain one or more recognition sites within a nucleic acidsequence for a protein, such that a recognition site within a nucleicacid sequence of a first member of the array is different from arecognition site within a nucleic acid sequence of a second member ofthe array.

[0102] Described below is how to prepare an array of immobilized firststrands, how to prepare and/or design a primer useful according to theinvention, how to prime synthesis of a second strand that iscomplementary to- and duplexed with the first array-bound strand, how toincorporate a sequence specifying a recognition site within a nucleicacid sequence for a protein, and how to bind a protein thereto.

[0103] Nucleic acid arrays of the invention are prepared as describedherein below in the section entitled “Bimolecular Double StrandedNucleic Acid Arrays”.

[0104] The nucleic acid array is prepared using nucleic acid sequencescontaining recognition sites within a nucleic acid sequence for aprotein or proteins.

[0105] Protein and Recognition Sequences Therefor Useful According tothe Invention

[0106] A recognition site within a nucleic acid sequence for a proteinuseful according to the invention may be based on a naturally-occurringDNA sequence or synthetic (modified) version of such a sequence which isof higher or lower affinity for a given protein than is a correspondingnatural sequence. Recognition sites within a nucleic acid sequence for aprotein useful according to the invention include, but are not limitedto, the following E. coli recognition sites within a nucleic acidsequence for proteins which bind DNA: Gene Encoding Protein RecognitionSite for a Protein (Uppercase=base most frequently observed at thatposition) FadR ATCTGGTACGACCAGAT [SEQ ID NO: 3] Ada AAAGCGCA CrpaaaTGTGAtct agaTCACAttt [SEQ ID NO: 4] HsdM AAC(n₆)GTGC [SEQ ID NO: 5]HsdR AAC(n₆)GTGC [SEQ ID NO: 5] CI_434 ACAAtat ataTTGT [SEQ ID NO: 6]Cro_434 ACAAtat ataTTGT [SEQ ID NO: 6] TrpR ACTAgtt Lrp AgaATw n wATtcT[SEQ ID NO: 7] MetJ AGACGTCT MalI ATAAAac gtTTTAT [SEQ ID NO: 8] FnraTTGATnn nnATCAAt [SEQ ID NO: 9] OxyR ATyG(n₆)CrAT [SEQ ID NO: 10]RpoH32 ccccc(n₁₈)cccc [SEQ ID NO: 11] Rafk cCGAAAc gTTTCGg [SEQ ID NO:12] Dcm CCWGG NhaR cgcartattcaygytgrtgat [SEQ ID NO: 13] RpoN54 ciggo(n₇) ttgca [SEQ ID NO: 14] PhoB CTkTCATAwAwCTGTCAy [SEQ ID NO: 15] FurGAAAATAATTCTTATTTCG [SEQ ID NO: 16] Dam GATC DnaB GATCTnTTnTTTT [SEQ IDNO: 17] SoxS GCAC(n₇)CAA [SEQ ID NO: 18] MalT GGAKGA GalR gTGTAAncgnTTACAc [SEQ ID NO: 19] RpoS38 gttaag(n₁₈)cgtcc [SEQ ID NO: 20] LexAtaCTGTatat atatACAGta [SEQ ID NO: 21] EbgR tAGTAAaa n ttTTACTa [SEQ IDNO: 22] CI_lam tATCACcg n gcGTGATa [SEQ ID NO: 23] Cro_lam tATCACcg ngcGTGATa [SEQ ID NO: 23] HipB TATCC(N₈)GGATA [SEQ ID NO: 24] MetR TGAA(n₅) TTCA [SEQ ID NO: 25] FruR TGAAAC GTTTCA [SEQ ID NO: 26] ArgRtGAATan ntATTCa [SEQ ID NO: 27] NtrC TGCACCWW n ww GGTGCA [SEQ ID NO:28] TyrR TGTAAA(N₆)TTTACA [SEQ ID NO: 29] DicA TGTTAnGYyA TrrCnTAACA[SEQ ID NO: 30] DicC TGTTAnGYYA TrrCnTAACA [SEQ ID NO: 30] AraCTnTGGAC(n₆)GCTA [SEQ ID NO: 31] DnaA TTATCCACA RpoD70ttgaca(n₁₆₋₁₈)tataat [SEQ ID NO: 32, 33 and 34] CytR tTGAwCn nGwTCAt[SEQ ID NO: 35] IlvY TTGC (n₆) GCAA [SEQ ID NO: 36] C2_lam TTGC(n₆)TTGC[SEQ ID NO: 37] LacI tTGTGAgc(n₀₋₁)gcTCACAa [SEQ ID NO: 38 and 39] DeoRtTGTTAgaa ttcTAACAa [SEQ ID NO: 40] KorB TTTAGC n GCTAAA [SEQ ID NO: 41]HimA WATCAANNNNTTR [SEQ ID NO: 42] GlpR wATGTTCGwT AwCGAACATw [SEQ IDNO: 43]

[0107] Nucleic Acid/Protein Array Assays

[0108] Assays according to the invention include incubation of a nucleicacid array (produced as described below) with a protein, wherein thenucleic acid member molecules of the array comprise at least tworecognition sites for a protein, such that a recognition site for aprotein of a first member of the array is different from a recognitionsite for a protein of a second member of the array. The buffer used inthe assay is generally a physiological buffer which does not result indenaturation of the protein; for example, a no-salt or low-salt bufferat neutral pH. Such a buffer might include 0-1M salt, 1-100 mM Tris-HCl,pH 8.0. The protein may be present in the buffer in thesubpicomolar-to-millimolar range, for example, in themicromolar-to-nanomolar range. The incubation is performed at aboutphysiological temperature for those proteins that are active at thistemperature, or may be performed at low temperature (0° C.) using, forexample, frost-tolerant proteins of certain plants, or at very hightemperatures (even up to 100° C.) using thermophilic proteins.

[0109] Double-Stranded Bimolecular Nucleic Acid Arrays

[0110] I. Preparation of an Array of Immobilized First Nucleic AcidStrands

[0111] Synthesis of a nucleic acid array useful according to the presentinvention is a bipartite process, which entails the production of adiverse array of single-stranded nucleic acid molecules that areimmobilized on the surface of a solid support matrix, followed bypriming and enzymatic synthesis of a second nucleic acid strand,comprising either RNA or DNA. A highly preferred method of carrying outsynthesis of the immobilized single-stranded array is that of Lockhart,described in U.S. Pat. No. 5,556,752 the contents of which are hereinincorporated by reference. Of the methods described therein, that whichis of particular use describes the synthesis of such an array on thesurface of a single solid support having a plurality of preselectedregions. A method whereby each chemically-distinct feature of the arrayis synthesized on a separate solid support is also described byLockhart. These methods, and others, are briefly summarized below.

[0112] The solid support may comprise biological, nonbiological, organicor inorganic materials, or a combination of any of these. It iscontemplated that such materials may exist as particles, strands,precipitates, gels, sheets, tubing, spheres, containers, capillaries,pads, slices, films, plates or slides. Preferably the solid supporttakes the form of plates or slides, small beads, pellets, disks or otherconvenient forms. It is highly preferred that at least one surface ofthe support is substantially flat. The solid support may take onalternative surface configurations. For example, the solid support maycontain raised or depressed regions on which synthesis takes place. Insome instances, the solid support will be chosen to provide appropriatelight-absorbing characteristics. For example, the support may be apolymerized Langmuir Blodgett film, functionalized glass, Si, Ge, GaAs,GaP, SiO₂, SiN₄, modified silicon, or one of a variety of gels orpolymers such as (poly)tetrafluoroethylene, (poly)vinylidendifluoride,polystyrene, polycarbonate, or combinations thereof Other suitable solidsupport materials may be used, and will be readily apparent to those ofskill in the art. Preferably, the surface of the solid support willcontain reactive groups, which could be carboxyl, amino, hydroxyl,thiol, or the like. More preferably, the surface will be opticallytransparent and will have surface Si-OH functionalities, such as arefound on silica surfaces.

[0113] According to the invention, a first nucleic acid strand isanchored to the solid support by as little as an intermolecular covalentbond. Alternatively, a more elaborate linking molecule may attach thenucleic acid strand to the support. Such a molecular tether may comprisea surface-attaching portion which is directly attached to the solidsupport. This portion can be bound to the solid support viacarbon-carbon bonds using, for example, supports having(poly)trifluorochloroethylene surfaces, or preferably, by siloxane bonds(using, for example, glass or silicon oxide as the solid support).Siloxane bonds with the surface of the support can be formed viareactions of surface attaching portions bearing trichlorosilyl ortrialkoxysilyl groups. The surface attaching groups will also have asite for attachment of the longer chain portion. It is contemplated thatsuitable attachment groups may include amines, hydroxyl, thiol, andcarboxyl groups. Preferred surface attaching portions includeaminoalkylsilanes and hydroxyalkylsilanes. It is particularly preferredthat the surface attaching portion of the spacer is selected from thegroup comprising bis(2-hydroxyethyl)-aminopropyltriethoxysilane,2-hydroxyethylaminopropyltriethoxysilane, aminopropyltriethoxysilane andhydroxypropyltriethoxysilane.

[0114] The longer chain portion of the spacer can be one of a variety ofmolecules which are inert to the subsequent conditions for polymersynthesis, examples of which include: aryl acetylene, ethylene glycololigomers containing 2-14 monomer units, diamines, diacids, amino acids,peptides, or combinations thereof. It is contemplated that the longerchain portion is a polynucleotide. The longer chain portion which is tobe used as part of the spacer can be selected based upon itshydrophilic/hydrophobic properties to improve presentation of thedouble-stranded oligonucleotides to certain receptors, proteins ordrugs. It can be constructed of polyethyleneglycols, polynucleotides,alkylene, polyalcohol, polyester, polyamine, polyphosphodiester andcombinations thereof.

[0115] Additionally, for use in synthesis of the arrays of theinvention, the spacer will typically have a protecting group, attachedto a functional group (i.e., hydroxyl, amino or carboxylic acid) on thedistal or terminal end of the chain portion (opposite the solidsupport). After deprotection and coupling, the distal end is covalentlybound to an oligomer.

[0116] As used in discussion of the spacer region, the term “alkyl”refers to a saturated hydrocarbon radical which may be straight-chain orbranced-chain (for example, ethyl, isopropyl, t-amyl, or2,5-0dimethylhexyl). When “alkyl” or “alkylene” is used to refer to alinking group or a spacer, it is taken to be a group having twoavailable valences for covalent attachment, for example, —CH₂CH₂—,—CH₂CH₂CH₂—, —CH₂CH₂CH(CH₃)CH₂— —CH₂(CH₂CH₂)₂CH₂—. Preferred alkylgroups as substitutents are those containing 1 to 10 carbon atoms, withthose containing 1 ato 6 carbon atoms being particularly preferred.Preferred alkyl or alkylene groups as linking groups are thosecontaining 1 to 20 carbon atoms, with those containing 3 to 6 carbonatoms being particularly preferred. The term “polyethylene glycol” isused to refer to those molecules which have repeating units of ethyleneglycol, for example, hexaethylene glycol (HO—(CH₂CH₂O)₅—CH₂(CH₂CH₂OH).When the term “polyethylene glycol” is used to refer to linking groupsand spacer groups, it would be understood by one of skill in the artthat other polyethers of polyols could be used as well (i.e.,polypropylene glycol or mistures of ethylene and propeylene glycols).

[0117] The term “protecting group”, as used herein, refers to any of thegroups which are designed to block one reactive site in a molecule whilea chemical reaction is carried out at another reactive site. Moreparticularly, the protecting groups used herein can be any of thosegroups described in Greene et al., 1991, Protective Groups In OrganicChemistry, 2nd Ed., John Wiley & Sons, New York, N.Y, incorporatedherein by reference. The proper selection of protecting groups for aparticular synthesis will be governed by the overall methods employed inthe synthesis. For example, in “light-directed” synthesis, discussedbelow, the protecting groups will be photolabile protecting groups, e.g.NVOC and MeNPOC. In other methods, protecting groups may be removed bychemical methods and include groups such as FMOC, DMT and others knownto those of skill in the art.

[0118] a. Nucleic Acid Arrays on a Single Support

[0119] 1. Light-Directed Methods

[0120] Where a single solid support is employed, the oligonucleotides ofthe present invention can be formed using a variety of techniques knownto those skilled in the art of polymer synthesis on solid supports. Forexample, “light-directed” methods, techniques in a family of methodsknown as VLSIPS™ methods, are described in U.S. Pat. No. 5,143,854 andU.S. Pat. No. 5,510,270 and U.S. Pat. No. 5,527,681, which are hereinincorporated by reference. These methods, which are illustrated in FIG.1 (adapted from Pease et al., 1994, Proc. Natl. Acad. Sci. U.S.A., 91:5022-5026), involve activating predefined regions of a solid support andthen contacting the support with a preselected monomer solution. Theseregions can be activated with a light source, typically shown through amask (much in the manner of photolithography techniques used inintegrated circuit fabrication). Other regions of the support remaininactive because illumination is blocked by the mask and they remainchemically protected. Thus, a light pattern defines which regions of thesupport react with a given monomer. By repeatedly activating differentsets of predefined regions and contacting different monomer solutionswith the support, a diverse array of polymers is produced on thesupport. Other steps, such as washing unreacted monomer solution fromthe support, can be used as necessary. Other applicable methods includemechanical techniques such as those described in PCT No. 92/10183, U.S.Pat. No. 5,384,261 also incorporated herein by reference for allpurposes. Still further techniques include bead based techniques such asthose described in PCT US/93/04145, also incorporated herein byreference, and pin based methods such as those described in U.S. Pat.No. 5,288,514, also incorporated herein by reference.

[0121] The VLSIPS™ methods are preferred for making the compounds andarrays of the present invention. The surface of a solid support,optionally modified with spacers having photolabile protecting groupssuch as NVOC and MeNPOC, is illuminated through a photolithographicmask, yielding reactive groups (typically hydroxyl groups) in theilluminated regions. A 3′-O-phosphoramidite activated deoxynucleoside(protected at the 5′-hydroxyl with a photolabile protecting group) isthen presented to the surface and chemical coupling occurs at sites thatwere exposed to light. Following capping and oxidation, the support isrinsed and the surface illuminated through a second mask, to exposeadditional hydroxyl groups for coupling. A second 5′-protected,3′-O-phosphoramidite activated deoxynucleoside is presented to thesurface. The selective photodeprotection and coupling cycles arerepeated until the desired set of oligonucleotides is produced.Alternatively, an oligomer of from, for example, 4 to 30 nucleotides canbe added to each of the preselected regions rather than synthesize eachmember one nucleotide monomer at a time.

[0122] 2. Flow Channel or Spotting Methods

[0123] Additional methods applicable to array synthesis on a singlesupport are described in U.S. Pat. No. 5,384,261, incorporated herein byreference for all purposes. In the methods disclosed in theseapplications, reagents are delivered to the support by either (1)flowing within a channel defined on predefined regions or (2) “spotting”on predefined regions. Other approaches, as well as combinations ofspotting and flowing, may be employed as well. In each instance, certainactivated regions of the support are mechanically separated from otherregions when the monomer solutions are delivered to the various reactionsites.

[0124] A typical “flow channel” method applied to arrays of the presentinvention can generally be described as follows: Diverse polymersequences are synthesized at selected regions of a solid support byforming flow channels on a surface of the support through whichappropriate reagents flow or in which appropriate reagents are placed.For example, assume a monomer “A” is to be bound to the support in afirst group of selected regions. If necessary, all or part of thesurface of the support in all or a part of the selected regions isactivated for binding by, for example, flowing appropriate reagentsthrough all or some of the channels, or by washing the entire supportwith appropriate reagents. After placement of a channel block on thesurface of the support, a reagent having the monomer A flows through oris placed in all or some of the channel(s). The channels provide fluidcontact to the first selected regions, thereby binding the monomer A tothe support directly or indirectly (via a spacer) in the first selectedregions.

[0125] Thereafter, a monomer B is coupled to second selected regions,some of which may be included among the first selected regions. Thesecond selected regions will be in fluid contact with a second flowchannel(s) through translation, rotation, or replacement of the channelblock on the surface of the support; through opening or closing aselected valve; or through deposition of a layer of chemical orphotoresist. If necessary, a step is performed for activating at leastthe second regions. Thereafter, the monomer B is flowed through orplaced in the second flow channel(s), binding monomer B at the secondselected locations. In this particular example, the resulting sequencesbound to the support at this stage of processing will be, for example,A, B, and AB. The process is repeated to form a vast array of sequencesof desired length at known locations on the support.

[0126] After the support is activated, monomer A can be flowed throughsome of the channels, monomer B can be flowed through other channels, amonomer C can be flowed through still other channels, etc. In thismanner, many or all of the reaction regions are reacted with a monomerbefore the channel block must be moved or the support must be washedand/or reactivated. By making use of many or all of the availablereaction regions simultaneously, the number of washing and activationsteps can be minimized.

[0127] One of skill in the art will recognize that there are alternativemethods of forming channels or otherwise protecting a portion of thesurface of the support. For example, a protective coating such as ahydrophilic or hydrophobic coating (depending upon the nature of thesolvent) is utilized over portions of the support to be protected,sometimes in combination with materials that facilitate wetting by thereactant solution in other regions. In this manner, the flowingsolutions are further prevented from passing outside of their designatedflow paths.

[0128] The “spotting” methods of preparing compounds and arrays of thepresent invention can be implemented in much the same manner. A firstmonomer, A, can be delivered to and coupled with a first group ofreaction regions which have been appropriately activated. Thereafter, asecond monomer, B, can be delivered to and reacted with a second groupof activated reaction regions. Unlike the flow channel embodimentsdescribed above, reactants are delivered in relatively small quantitiesby directly depositing them in selected regions. In some steps, theentire support surface can be sprayed or otherwise coated with asolution, if it is more efficient to do so. Precisely measured aliquotsof monomer solutions may be deposited dropwise by a dispenser that movesfrom region to region. Typical dispensers include a micropipette todeliver the monomer solution to the support and a robotic system tocontrol the position of the micropipette with respect to the support, oran ink-jet printer. In other embodiments, the dispenser includes aseries of tubes, a manifold, an array of pipettes, or the like so thatvarious reagents can be delivered to the reaction regionssimultaneously.

[0129] 3. Pin-Based Methods

[0130] Another method which is useful for the preparation of theimmobilized arrays of single-stranded DNA molecules X of the presentinvention involves “pin-based synthesis.” This method, which isdescribed in detail in U.S. Pat. No. 5,288,514, previously incorporatedherein by reference, utilizes a support having a plurality of pins orother extensions. The pins are each inserted simultaneously intoindividual reagent containers in a tray. An array of 96 pins is commonlyutilized with a 96-container tray, such as a 96-well microtitre dish.

[0131] Each tray is filled with a particular reagent for coupling in aparticular chemical reaction on an individual pin. Accordingly, thetrays will often contain different reagents. Since the chemicalreactions have been optimized such that each of the reactions can beperformed under a relatively similar set of reaction conditions, itbecomes possible to conduct multiple chemical coupling stepssimultaneously. The invention provides for the use of support(s) onwhich the chemical coupling steps are conducted. The support isoptionally provided with a spacer, S, having active sites. In theparticular case of oligonucleotides, for example, the spacer may beselected from a wide variety of molecules which can be used in organicenvironments associated with synthesis as well as aqueous environmentsassociated with binding studies such as may be conducted between thenucleic acid members of the array and other molecules. These moleculesinclude, but are not limited to, proteins (or fragments thereof),lipids, carbohydrates, proteoglycans and nucleic acid molecules.Examples of suitable spacers are polyethyleneglycols, dicarboxylicacids, polyamines and alkylenes, substituted with, for example, methoxyand ethoxy groups. Additionally, the spacers will have an active site onthe distal end. The active sites are optionally protected initially byprotecting groups. Among a wide variety of protecting groups which areuseful are FMOC, BOC, t-butyl esters, t-butyl ethers, and the like.

[0132] Various exemplary protecting groups are described in, forexample, Atherton et al., 1989, Solid Phase Peptide Synthesis, IRLPress, incorporated herein by reference. In some embodiments, the spacermay provide for a cleavable function by way of, for example, exposure toacid or base.

[0133] b. Arrays on Multiple Supports

[0134] Yet another method which is useful for synthesis of compounds andarrays of the present invention involves “bead based synthesis.” Ageneral approach for bead based synthesis is described in PCT/US93/04145(filed Apr. 28, 1993), the disclosure of which is incorporated herein byreference.

[0135] For the synthesis of molecules such as oligonucleotides on beads,a large plurality of beads are suspended in a suitable carrier (such aswater) in a container. The beads are provided with optional spacermolecules having an active site to which is complexed, optionally, aprotecting group.

[0136] At each step of the synthesis, the beads are divided for couplinginto a plurality of containers. After the nascent oligonucleotide chainsare deprotected, a different monomer solution is added to eachcontainer, so that on all beads in a given container, the samenucleotide addition reaction occurs. The beads are then washed of excessreagents, pooled in a single container, mixed and re-distributed intoanother plurality of containers in preparation for the next round ofsynthesis. It should be noted that by virtue of the large number ofbeads utilized at the outset, there will similarly be a large number ofbeads randomly dispersed in the container, each having a uniqueoligonucleotide sequence synthesized on a surface thereof after numerousrounds of randomized addition of bases. As pointed out by Lockhart (U.S.Pat. No. 5,556,752) an individual bead may be tagged with a sequencewhich is unique to the double-stranded oligonucleotide thereon, to allowfor identification during use.

[0137] II. Preparation of Oligonucleotide Primers

[0138] Oligonucleotide primers useful to synthesize bimolecular arraysare single-stranded DNA or RNA molecules that are hybridizable to anucleic acid template to prime enzymatic synthesis of a second nucleicacid strand. The primer may therefore be of any sequence composition orlength, provided it is complementary to a portion of the first strand.

[0139] It is contemplated that such a molecule is prepared by syntheticmethods, either chemical or enzymatic. Alternatively, such a molecule ora fragment thereof may be naturally occurring, and may be isolated fromits natural source or purchased from a commercial supplier. It iscontemplated that oligonucleotide primers employed in the presentinvention will be 6 to 100 nucleotides in length, preferably from 10 to30 nucleotides, although oligonucleotides of different length may beappropriate.

[0140] Additional considerations with respect to design of a selectedprimer relate to duplex formation, and are described in detail in thefollowing section.

[0141] III. Primed Enzymatic Second-Strand Nucleic Acid Synthesis toform a Double-Stranded Array

[0142] Of central importance in carrying out preparation of abimolecular array is selective hybridization of an oligonucleotideprimer to the first nucleic acid strand in order to permit enzymaticsynthesis of the second nucleic acid strand. Any of a number of enzymeswell known in the art can be utilized in the synthesis reaction.Preferably, enzymatic synthesis of the second strand is performed usingan enzyme selected from the group comprising DNA polymerase I (exo(−)Klenow fragment), T4 DNA polymerase, T7 DNA polymerase, modified T7 DNApolymerase, Taq DNA polymerase, exo⁽⁻⁾ vent DNA polymerase, exo⁽⁻⁾ deepvent DNA polymerase, reverse transcriptase and RNA polymerase.

[0143] Typically, selective hybridization will occur when two nucleicacid sequences are substantially complementary (typically, at leastabout 65% complementary over a stretch of at least 14 to 25 nucleotides,preferably at least about 75%, more preferably at least about 90%complementary). See Kanehisa, M., 1984, Nucleic Acids Res. 12: 203,incorporated herein by reference. As a result, it is expected that acertain degree of mismatch at the priming site can be tolerated. Suchmismatch may be small, such as a mono-, di- or tri-nucleotide.Alternatively, it may encompass loops, which we define as regions inwhich mismatch encompasses an uninterrupted series of four or morenucleotides. Note that such loops within the oligonucleotide primingsite are encompassed by the present invention; however, the inventiondoes not provide double-stranded nucleic acids that comprise loopstructures between the 5′ end of the first strand and the 3′ end of thesecond strand. In addition, loop structures outside the priming site,but which do not encumber the 5′ end of the first strand or the 3′ endof the second strand are not provided by the present invention, sincethere is no known mechanism for generating such structures in the courseof enzymatic second-strand nucleic acid synthesis. Both the 5′ end ofthe first strand and the 3′ end of the second strand must be free ofattachment to each other via a continuous single strand.

[0144] Either strand may comprise RNA or DNA. Overall, five factorsinfluence the efficiency and selectivity of hybridization of the primerto the immobilized first strand. These factors are (i) primer length,(ii) the nucleotide sequence and/or composition, (iii) hybridizationtemperature, (iv) buffer chemistry and (v) the potential for sterichindrance in the region to which the probe is required to hybridize.

[0145] There is a positive correlation between primer length and boththe efficiency and accuracy with which a primer will anneal to a targetsequence; longer sequences have a higher T_(M) than do shorter ones, andare less likely to be repeated within a given first nucleic acid strand,thereby cutting down on promiscuous hybridization. Primer sequences witha high G-C content or that comprise palindromic sequences tend toself-hybridize, as do their intended target sites, since unimolecular,rather than bimolecular, hybridization kinetics are genererally favoredin solution; at the same time, it is important to design a primercontaining sufficient numbers of G-C nucleotide pairings to bind thetarget sequence tightly, since each such pair is bound by three hydrogenbonds, rather than the two that are found when A and T bases pair.Hybridization temperature varies inversely with primer annealingefficiency, as does the concentration of organic solvents, e.g.formamide, that might be included in a hybridization mixture, whileincreases in salt concentration facilitate binding. Under stringenthybridization conditions, longer probes must be used, while shorter oneswill suffice under more permissive conditions. Stringent hybridizationconditions will typically include salt concentrations of less than about1M, more usually less than about 500 mM and preferably less than about200 mM. Hybridization temperatures can be as low as 5° C., but aretypically greater than 22° C., more typically greater than about 30° C.,and preferably in excess of about 37° C. Longer fragments may requirehigher hybridization temperatures for specific hybridization. As severalfactors may affect the stringency of hybridization, the combination ofparameters is more important than the absolute measure of any one alone.

[0146] Primers must be designed with the above first four considerationsin mind. While estimates of the relative merits of numerous sequencescan be made mentally, computer programs have been designed to assist inthe evaluation of these several parameters and the optimization ofprimer sequences. Examples of such programs are “PrimerSelect” of theDNAStar™ software package (DNAStar, Inc.; Madison, Wis.) and OLIGO 4.0(National Biosciences, Inc.). Once designed, suitable oligonucleotidesmay be prepared by the phosphoramidite method described by Beaucage andCarruthers, 1981, Tetrahedron Lett., 22: 1859-1862, or by the triestermethod according to Matteucci et al., 1981, J. Am. Chem. Soc., 103:3185, both incorporated herein by reference, or by other chemicalmethods using either a commercial automated oligonucleotide synthesizeror VLSIPS™ technology (discussed in detail below).

[0147] The fifth consideration, steric hindrance, is one that was ofparticular relevance to the development of the invention disclosedherein. While methods for the primed, enzymatic synthesis of secondnucleic acid strands from immobilized first strands are known in the art(see Uhlen, U.S. Pat. No. 5,405,746 and Utermohlen, U.S. Pat. No.5,437,976), the present method differs in that the priming site, asdetermined by the location of the 3′ end of the first strand (X), isadjacent to the surface of the solid support. In a typical silica-basedchip array, made as per Lockhart (U.S. Pat. No. 5,556,752), a 20 μm²region carries approximates 4×10⁶ functional copies of a specificsequence, with an intermolecular spacing distance of about 100 Å (Cheeet al., 1996, Science, 274: 610-614). As a result, it is necessary thatthe oligonucleotide primer hybridize efficiently to an anchored targetin a confined space, and that synthesis proceed outward from thesupport. In the above-referenced disclosures, it is the 5′ end of thefirst oligonucleotide strand which is linked to the matrix; therefore,priming of the free end of that molecule is permitted, and second-strandextension proceeds toward the solid support. Under the circumstances,significant uncertainty existed as to whether oligonucleotide priming ofthe end of the first strand proximal to the solid support would occur ata sufficiently high frequency to yield a high-density double-strandednucleic acid array.

EXAMPLE 1

[0148] This example illustrates the general synthesis of an array ofbimolecular, double-stranded oligonucleotides on a solid support whicharrays, such as may comprise recognition sites for a protein orproteins.

[0149] As a first step, single-stranded DNA molecules were synthesizedon a solid support using standard light-directed methods (VLSIPS™protocols), as as described above, using the method of Lockhart, U.S.Pat. No. 5,556,752, the contents of which incoporated above byreference.

[0150] Hexaethylene glycol (PEG) linkers were used to covalently attachthe synthesized oligonucleotides to the derivatized glass surface. Aheterogeneous array of linkers was formed such that some sectors of thesilica chip had linkers comprising two PEG linkers, while other sectorsbore linkers comprising a single PEG molecule (FIG. 2). In addition, theintermolecular distance between linker molecules (and, consequently,nascent nucleic acid strands) was varied such that for either length oflinker and for each of the 9,600 distinct molecular species synthesized,were 15 different chip sectors representing the following range ofstrand densities. These densities, expressed as the percent of totalanchoring sites occupied by nucleic acid molecules, are shown inTable 1. TABLE 1 % of sites filled % of sites filled, cont'd. % of sitesfilled, cont'd. 0.4 25.0 69.1 1.6 31.5 75.8 3.1 39.7 83.1 6.2 50.0 91.212.5 63.0 100.0

[0151] Synthesis of the first strand proceeded one nucleotide at a timeusing repeated cycles of photo-deprotection and chemical coupling ofprotected nucleotides. The nucleotides each had a protecting group onthe base portion of the monomer as well as a photolabile MeNPocprotecting group on the 5′ hydroxyl. Note that each of the differentmolecular species occupies a different physical region on the chip sothat there is a one-to-one correspondence between molecular identity andphysical location. Moving outward from the chip, the sequence of eachmolecule proceeds from its 3′ to its 5′ end (the 3′ end of the DNAmolecule is attached to the solid surface via a silyl group and 2 PEGlinkers), as is the case when chemical synthetic methods are utilized.

[0152] Second strand synthesis, as stated above, requires priming of asite at the 3′ end of the first nucleic acid strand, followed byenzymatic extension of the primed sequence. DNA polymerase I (exo⁽⁻⁾Klenow fragment) was employed in this experiment, although numerousother enzymes, as discussed above, may be employed advantageously. Thisparticular enzyme is optimally active at 37° C.; therefore, two primingsites and the corresponding complementary primers were designed thatwere predicted to bind efficiently and yet exhibit a minimum ofsecondary structure at that temperature according to calculationsperformed by the DNAStar “PrimerSelect” computer program, which wasemployed for this purpose. The sequences of these primers were asfollows: 1s 5′--TCCACACTCTCCAACA--3′ (estimated T_(M)= 36.8° C.) [SEQ IDNO: 1] 2s 5′--GGACCCTTTGACTTGA--3′ (estimated T_(M)= 38.7° C.) [SEQ IDNO: 2]

[0153] Note that the optimal reaction temperature varies considerablyamong polymerases. Also of use according to the methods of the inventionare exo⁽⁻⁾ vent DNA polymerase and exo⁽⁻⁾ deep vent DNA polymerase (bothcommercially available from New England Biolabs, Beverly, Mass.), whichare optimally active at 72° C. and approximately 30% active at 50° C.,according to the manufacturer. Were these enzymes used instead, longerprimer sequences, or those with a higher G-C content, would have to havebeen employed.

[0154] In the case of the synthesis presented in FIG. 2, primer SI [SEQID NO: 1] was used. The reaction conditions were as follows:

[0155] Prehybridization of chip: 0.005% Triton X-100, 0.2 mg/mlacetylated bovine serum albumin (BSA), 10 mM Tris-HCl (pH 7.5), 5 mMMgCl₂ and 7.5 mM dithiothreitol (DTT) at 37° C. for 30 to 60 minutes ona rotisserie.

[0156] Second-strand primer extension and fluorescein labeling: 0.005%Triton, 10 mM Tris-HCl (pH 7.5), 5 mM MgCl₂, 7.5 mM DTT, 0.4 mM dNTP's,0.4 μM primer, 0.04 U/μl DNA Polymerase I (3′ to 5′ exo⁽⁻⁾ Klenowfragment, New England Biolabs, Beverly, Mass.) and 0.0004 mM offluorescein-12-labeled dATP at 37° C. for 1 to 2 hours on a rotisserie,followed by a wash in 0.005% Triton X-100 in 6×SSPE at room temperature.(Note that an alternate labeling procedure, not used in the experimentpresented in this Example, is one in which unlabeled extension isperformed, followed by labeled primer extension using terminaldeoxynucleotide transferase. This reaction takes place as follows:0.005% Triton X-100, 10 mM Tris acetate, pH 7.5, 10 mM magnesiumacetate, 50 mM potassium acetate, 0.044 U/μl terminal transferase and0.014 mM of any fluorescein-12-labeled dideoxynucleotide at 37° C. for1-2 hr. on a rotisserie, followed by a wash in 0.005% Triton X-100 in6×SSPE at room temperature.)

[0157] To confirm that second-strand synthesis had taken place, the chipwas scanned under a layer of wash buffer for fluorescence in an argonlaser confocal scanner (see U.S. Pat. No. 5,578,832). This deviceexposes the molecules of the array to irradiation at a wavelength of 488nanometers, which excites electrons in the fluorescein moiety, resultingin fluorescent emissions, which are then recorded at each position ofthe chip (FIG. 3). Since the first strand was unlabeled, the efficiencyof second-strand synthesis can be measured. The result is shown in FIG.2, where various sectors of the chip fluoresce with differentintensities, in proportion both to strand density and to the proportionof dATP residues in the second strand.

[0158] Further confirmation of successful second-strand synthesis wasgained from a biochemical assay of the chip. According to thefirst-strand synthesis procedure, several sectors of the chip weredesigned such that the several unique sequences synthesized at thosepositions contained a 4 base motif which, when double-stranded, wouldform a recognition site for the endonuclease RsaI. The chip was digestedin RsaI, using the manufacturer's recommended incubation conditions.Upon re-scanning of the chip in the argon laser scanner, a dark areaappeared. This can be seen in FIG. 2, and is shown in detail in FIG. 4.Since the ability of the enzyme to cleave the sequence from the chip isdependent upon the sequence being double-stranded, synthesis, at leastto the point of the RsaI recognition site, must have occurred.

[0159] In addition to providing evidence of successful second-strandsynthesis, cleavage of double-stranded nucleic acid molecules from thesolid support with RsaI demonstrates that members of the array areaccessible to proteins in solution, a requirement if the arrays of theinvention are to be useful in carrying out assays of protein/DNAinteractions.

EXAMPLE 2

[0160] Isolation of Proteins which Bind a Candidate Recognition Site fora Protein of an Array

[0161] An array of double-stranded nucleic acid molecules is made asdescribed in Example 1, comprising test nucleic acid sequences ofunknown protein-binding characteristics that are a) chosen becausecomparative sequence analysis or functional studies of a gene promoterimplicates them as gene regulatory elements or b) generated de novo foruse according to the invention. Alternatively, nucleic acid sequencesthat have been found to bind at least one known protein are used (seeExample 3, below); a number of recognition sites for known proteins arelisted above.

[0162] After nucleic acid synthesis, a sample comprising a plurality ofprotein molecules is incubated with the array under conditions underwhich permit protein:nucleic acid binding, as described above; suchconditions may be relatively stringent (high salt—approximately 1M) or,if proteins are to be recovered which might bind recognition sites for aprotein or proteins in vivo that are related (but not identical) tosequences comprised by features of the array, lower salt concentrations(0 to 100 mM) are used. Unbound protein molecules are then washed away.Bound proteins are eluted from the array using a high salt buffer, andtransferred to a suitable storage buffer either through dialysisagainst- or precipitation and resuspension in such a buffer. Proteinsare separated by any chromatographic procedure known in the art, e.g.two-dimensional gel electrophoresis, and then sequenced, also bystandard methods, such as by mass spectrometry (e.g., liquidchromatography/electrospray ionization/ion trap tandem massspectrometry) or Edman degradation.

[0163] Following identification of the bound proteins, their relativeaffinities for the recognition sites for a protein or proteins are, ifdesired, assayed singly by binding them to chips or chromatographysupports to which are complexed oligonucleotides representing isolatedsequences of the array and eluting them off in buffers of graduallyincreasing ionic strength; binding affinity is directly proportional tothe salt concentration required to remove a given protein from a nucleicacid molecule. Alternatively, such binding affinities may be determinedas described below in Example 7.

EXAMPLE 3

[0164] Assessment of Factors which Influence Binding of a Protein to aRecognition Site for a Protein

[0165] In addition to changes in salt concentration in an in vitrosystem (which do not normally reflect conditions which would occur invivo), it is desirable to examine factors which might, in a livingsystem, influence or be made to influence nucleic acid/proteininteractions. This method is applicable if it is advantageous to inhibitbinding of a protein to a particular recognition site for a protein inorder to nullify its influence (appropriate or otherwise) on a givengene; alternatively, one might attempt to promote binding of such aprotein to the cis-regulatory sequence of a gene for which theappropriate trans-regulatory factor is absent or defective. Such aprocedure, in which the affinity of the phage λ 434 Cro protein for itscognate recognition site for a protein is examined, is described in thisexample.

[0166] A λ 434 Cro protein array is provided as follows:

[0167] In one embodiment of the invention, the DNA molecules referred toin Example 1 are synthesized so as to include the sequence ACAAtatataTTGT [SEQ ID NO: 6], which specifies the recognition site for the λ434 Cro protein.

[0168] λ 434 Cro protein is provided as described in the prior art, andis brought to a concentration of approximately 100 nM in 10 mM NaCl, 50mM Tris-HCl, pH 8.0, and incubated on the nucleic acid array madeaccording to the invention (as described above) for approximately 5minutes at 37° C.

[0169] The λ 434 Cro nucleic acid/protein array is used according to theinvention in several ways:

[0170] a) Binding affinities of other mutant Cro proteins, relative to λ434 Cro, may be determined by binding labeled λ 434 Cro to the array incompetition either with unlabeled λ 434 Cro (as a control) or the mutanttest protein, also unlabeled. The degree to which each protein is ableto prevent binding of labeled λ 434 Cro to the nucleic acid molecules ofthe array is indicative of its binding strength relative to that of λ434 Cro, as judged by the amount of label which is detected on the arrayafter unbound proteins are washed off. The amount of label present isinversely proportional to the affinity of the test protein for therecognition site for the λ 434 Cro protein.

[0171] b) The relative binding affinities of λ 434 Cro protein formutant recognition sites for the λ 434 Cro protein are tested byincubating an array produced as above (wherein the λ 434 Cro proteinmolecules are, additionally, labeled) with double-strandedoligonucleotides comprising the mutant sites for λ 434 Cro protein. Theamount of label present on the array is quantified both beforeincubation and after the oligonucleotides are washed away; thedifference in label still attached to the array relative to acomparably-treated control in which no competitor or a non-specificcompetitor (such as poly dI-dC or a population of random oligomers) isused is proportional to the affinity of λ 434 Cro protein for the mutantrecognition sites for λ 434 Cro protein. Alternatively, both the labeledλ 434 Cro protein and the oligonucleotides are present together in abuffer in which a nucleic acid array produced as described above isincubated. A control incubation, containing no mutant oligonucleotides,is set up in parallel, and the amount of labeled protein bound to eachis quantified.

[0172] c) Inhibitors of the binding interaction between λ 434 Croprotein and the recognition site for λ 434 Cro protein may be tested byeither of the methods described in a) and b). Candidate inhibitorsinclude substances which directly compete with λ 434 Cro for itsrecognition site or that compete with that recognition site for bindingto λ 434 Cro protein, such as other proteins with higher affinity forthe recognition site for λ 434 Cro protein than that of λ 434 Croprotein itself or nucleic acid molecules comprising engineeredrecognition sites for a protein for which λ 434 Cro protein may havehigher affinity than it has for the native recognition site for λ 434Cro protein. Inhibitors which indirectly prevent binding includeproteins or other substances which may disrupt the proper dimerizationof λ 434 Cro protein, such as salts, enzymes (e.g. proteases, kinases,phosphorylases, glycosylases) and other proteins with which it mightform unproductive dimers (either because one subunit lacks affinity fora half-site of the recognition site for λ 434 Cro protein or becausedimerization causes conformational changes in λ 434 Cro protein suchthat it is no longer functional)

EXAMPLE 4

[0173] Identification of Candidate Members of a Set of Co-RegulatedGenes Using Arrays of the Invention

[0174] As in Example 2, an array of double-stranded nucleic acidmolecules is made as described in Example 1, comprising test nucleicacid sequences of unknown protein-binding characteristics that are a)chosen because comparative sequence analysis or functional studies of agene promoter implicates them as gene regulatory elements or b)generated de novo for use according to the invention. Alternatively,nucleic acid sequences that have been found to bind at least one knownprotein are used (see Example 3, above); recognition sites for a numberof known proteins are listed above.

[0175] A protein complexed with a detectable label, such as a fluoresenttag or (as described below in Example 7) Green Fluorescent Protein, isincubated with the array under conditions which permit efficientprotein/nucleic acid interactions, such as in a physiological saltbuffer (also, above) at room temperature. After unbound protein iswashed from the array, using physiological buffer minus protein as thewash solution, the array is scanned to detect the presence of label. Theidentities of recognition sites for a protein or proteins present onmolecules of features of the array upon which label is detected arenoted. Nucleic acid databases are searched with these sequences. Genesin whose regulatory regions such sequences appear, whether upstream ordownstream of a gene, in introns, or in the 5′ or 3′ untranslatedregions of its mature mRNA transcript, are classified as beingpotentially under the control of the test protein in vivo. If two ormore of such genes are uncovered, they are said to form a set ofcandidate co-regulated genes, meaning that they may be under the controlof one or more of the same trans-regulatory factors, resulting in acommon expression profile, whether spatially or temporally. These genesmay then undergo functional analysis by methods known in the art (e.g.expression studies, such as Northern analysis, of each in a normalgenetic background as well as in one in which the test protein ismutated or absent) in order to confirm this supposition, if it is sodesired.

EXAMPLE 5

[0176] Nucleic Acid/Protein Arrays Comprising Heterodimers

[0177] While a number of proteins will bind recognition sites for aprotein as monomers or as di- or multimeric units comprising a multiplecopies of a single polypeptide sequence, others are able to bind only asheterogeneous aggregates, such as heterodimeric units. Recognition sitesfor a protein which are recognized by a heterodimer often lack the dyadsymmetry of nucleic acid sequence which is relatively common amongrecognition sites for a protein to which protein homodimers bind.Typically, each-monomer of a protein dimer (whether a homo- orheterodimer) binds what is termed a “half site”. Given a protein whichis known to bind a nucleic acid as part of a heterodimer and thesequence of the half site to which it binds, it is possible to determinethe range of partners with which it might pair in order to bind acomplete target sequence as follows:

[0178] An array of double-stranded nucleic acid molecules is prepared asdescribed above, wherein at least a portion of features of the arraycomprise a recognition site for a protein wherein the half siterecognized by the protein of interest (e.g., E. Coli IHF) is fused to arandom sequence, such that all oligonucleotide sequences of the chosenlength (for example, all hexamers or octamers) are represented on thearray in order to fill the remaining positions of the recognition sitesfor a protein or proteins on features thereof. The test protein islabeled by methods known in the art (radioactively, fluorescently,chemiluminescently, chromogenically or using mass-tags) and thenincubated with the array in the presence of a pool of proteinscomprising one or a plurality of potential binding partners underconditions which permit protein dimerization and protein/nucleic acidbinding. After unbound protein is washed from the array, the array isscanned in order to detect bound label, as described above.Alternatively, an unlabeled test protein is used and, after removal ofunbound protein from the array, an immunological detection scheme isemployed, in which a primary antibody specific for the test protein isfirst applied, followed by a labeled secondary antibody specific forimmunoglobulins of the host species in which the primary antibody wasproduced. Such labeled secondary antibodies are commercially available(for example, from Vector Laboratories; Burlingame, Calif.). Methods forthe production of primary antibodies against a test protein, if suchantibodies are not also commercially available, are well known in theart. The sequences to which label is bound are noted; these sequences(the half site to which the test protein binds in combination with therandom half site to which a member of the protein pool binds) are thenused individually to isolate each of the binding partners in sufficientquantities to permit protein sequencing.

[0179] Oligonucleotides comprising the recognition sites for a proteinon which label is dectected are bound to a chromatography matrix (suchas cellulose) and placed in a column. A preparative amount (picomolar tomillimolar concentrations in microliter to milliliter volumes) of thetest protein is incubated with an aliquot of protein comparable to thatused in binding the array (preferably, drawn from the same proteinpreparation) under identical buffer conditions, and the mixture is runover the column. After unbound protein is washed away, the boundcomplexes are washed from the column in a high salt buffer. Thedissociated subunits are then separated chromatographically and thenewly-isolated binding partner is sequenced, again by standard methods.

[0180] In order to determine whether the results gathered in vitro byaccording to the invention reflect a gene transcriptional mechanism thatis found in vivo, it is necessary both to demonstrate that the testprotein and a pairing partner isolated as described in this example areco-expressed (that is, expressed together both temporally and spatiallyin an organism)—if the two proteins do not co-exist in a cell, theycannot join to form a nucleic acid binding complex—and that therecognition site for a protein to which site the heteroduplex bindsoccurs in the genome of the organism, preferably, in association with atranscriptional unit. In vivo functional studies involving a target genecomprising such a recognition site for a protein are then performed; forexample, production of each of the two proteins is individuallyinhibited, for example with antisense RNA or a ribozyme specific for themessage encoding the protein, and the effect on the regulation of thetarget gene is observed. The finding that both proteins are necessaryfor the proper expression of the target gene provides strong, ifcircumstantial, evidence that the two components of the heterodimer actin concert to regulate it.

EXAMPLE 6 Nucleic Acid/Protein Arrays Comprising a Chimeric ProteinHeterodimer Test Subunit

[0181] The method described in Example 5, above, is well suited for thediscovery of heterodimeric pairing partners and their cognaterecognition sites for a protein; however, for each test protein forwhich pairing partners are sought, a new nucleic acid array must besynthesized, wherein the half site specific for the protein in questionis incorporated into every nucleic acid member in association with aspectrum of random half-site sequences, with each random half-siterepresented by members of a distinct feature, as described above. Giventhe high cost of array design and synthesis, such a requirement mightprove prohibitively expensive in certain situations.

[0182] A typical monomer which may form part of a heterodimericnucleic-acid-binding complex is, itself, a bipartite structure,comprising a dimerization domain and a nucleic acid binding domain (e.g.a DNA binding domain, as defined above). Methods by which these subunitsare separated from one another and recombined to form chimeric proteinswhich retain their capacity to bind nucleic acids are well known in theart (for methods of cloning, expression of cloned genes and proteinpurification, see Sambrook et al., 1989, Moleculur cloning. A LaboratoryManual., 2nd Edition, Cold Spring Harbor Laboratory Press, Cold SpringHarbor, N.Y.; Ausubel et al., Current Protocols in Molecular Biology,copyright 1987-1994, Current Protocols, copyright 1994-1998, John Wiley& Sons, Inc.). Such chimeric proteins have played a significant role inthe discovery of a number of gene trans-regulatory factors, e.g. via theinteraction-trap scheme in yeast (Fields and Song, 1989, Nature, 340:245-246). According to the present invention, the dimerization domain ofa protein for which pairing partners are sought is fused to the nucleicacid binding domain of a known protein, such as λ 434 Cro. Nucleic acidarrays are synthesized as in Example 5, except that the half siterecognized by λ 434 Cro is used, and the procedure of isolating,identifying and characterizing interactions involving candidate pairingpartners are performed, all as described above.

EXAMPLE 7

[0183] In the Examples above, proteins bound to recognition sites for aprotein or proteins present on nucleic acid molecules of arraysaccording to the invention are labeled using a variety of methods knownin the prior art; either they are labeled directly through covalentlinkage of radioactive, fluorescent, chemiluminescent or chromogenicsubstances or of mass-tags, or indirectly via binding to labeledantibodies. The present invention encompasses a procedure in whichchimeric proteins, each comprising a DNA binding domain fused in-frameto Green Fluorescent Protein (GFP), are produced by cloning, geneexpression and protein isolation methods well known in the art (seeSambrook et al., 1989, supra) and incubated with nucleic acid arrayscomprising recognition sites for a protein or proteins producedaccording to the methods of the invention in order to determine aconsensus sequence of a recognition site for a given protein. Since alabeling efficiency of 100% is achieved using this scheme, the amount offluorescence observed upon upon scanning of the array with an argonlaser scanner is directly proportional to the amount of protein bound,not only for the determination of relative binding efficiencies of theprotein to different recognition sites for a protein or proteins presenton an array of the invention (as described above, using instead otherlabeling methods combined with a set of buffers of graded saltconcentration), but even from protein preparation to proteinpreparation, allowing for accurate comparative quantitation of thebinding efficiencies of different proteins to features of the array, ifit is so desired.

[0184] After washing away any unbound fusion protein, the supportbearing the array is scanned with the scanning confocal microscope (FIG.5); the intensity of fluorescence, which is proportional to the amountof protein bound, is correlated with the sequences of nucleic acidmolecules, which are known at each position of the scanned surface. Therange of sequences to which a protein will bind, as well as the relativeefficiency of binding to each, can then be determined. In order tointerpret the results, the only source of fluorescence on the chip mustbe GFP; therefore, the nucleic acid molecules of the array must beunlabeled. The strand extension reaction described above can, ifdesired, be performed without the use of a fluorescent label; thereaction conditions are identical except that the fluorescein-labeleddATP is omitted, along with the wash step, the purpose of which is toremove unincorporated background fluorescence that ordinarily mightinterfere with scanning.

Use

[0185] The present invention is useful for the production of accurate,high-density, double-stranded nucleic acid arrays comprising recognitionsites within a nucleic acid sequence or sequences for a protein orproteins, as well as protein arrays thereof, the sequences of whichrecognition sites within a nucleic acid sequence for a protein can bedetermined based upon physical location within the array. The proteinarrays provided are useful in a variety of screening or identificationprocedures. For example, the arrays are useful for testing interactionsbetween a protein and its corresponding recognition site within anucleic acid sequence for a protein on a nucleic acid molecule.Alternatively, the arrays are useful for examining the effects onbinding of a protein to its recognition site within a nucleic acidsequence for a protein of interactions between the protein and a secondprotein which binds that protein. The arrays also are useful for lookingfor any nucleic acid seqeunce that is a substrate for a protein-directedenzymatic reaction, such as is mediated by an enzyme including, but notlimited to, a nuclease, or a nucleic acid modification enzyme, orisomerase. The invention is also of use in identifying genetrans-regulatory factors. The arrays also are useful for testing any oneof a number of protein- or protein/nucleic acid-based biologicalinteractions, such as those protein/protein interactions that occur insignal transduction cascades involving molecules that include, but arenot limited to, kinases, proteases or receptor/ligand complexes, as wellas identifying proteins, nucleic acids or other substances which mightinhibit such interactions. The invention is useful for assayingprotein/nucleic acid interactions where the protein or its correspondingrecognition site within a nucleic acid sequence for a protein hasundergone a mutation, or even where both have been mutated. Theinvention is of further use in determining the nucleic acid sequence ofa recognition site within a nucleic acid sequence for a protein that isrecognized by a given protein, or the consensus sequence of arecognition site within a nucleic acid sequence for such a protein orplurality of proteins, e.g., where such a nucleic acid sequence orsequences is/are unknown or incompletely characterized. The invention isof use in determining a consensus amino acid sequence of targeting aminoacid sequences of proteins which bind a given recognition site for aprotein. The arrays of the invention are additionally useful inidentifying genes which may be co-regulated. The arrays are thereforeultimately useful for identifying compositions that are of potentialscientific or clinical interest, particularly those with therapeuticpotential.

Other Embodiments

[0186] Other embodiments will be evident to those of skill in the art.It should be understood that the foregoing description is provided forclarity only and is merely exemplary. The spirit and scope of thepresent invention are not limited to the above examples, but areencompassed by the following claims.

1 43 1 16 DNA artificial sequence primer for second strand synthesis 1tccacactct ccaaca 16 2 16 DNA artificial sequence primer for secondstrand synthesis 2 ggaccctttg acttga 16 3 17 DNA Escherichia coli 3atctggtacg accagat 17 4 22 DNA Escherichia coli 4 aaatgtgatc tagatcacattt 22 5 13 DNA Escherichia coli misc_feature (4)..(9) n is a, c, g or t5 aacnnnnnng tgc 13 6 14 DNA Escherichia coli 6 acaatatata ttgt 14 7 13DNA Escherichia coli misc_feature (7)..(7) n is a, c, g or t 7agaatwnwat tct 13 8 14 DNA Escherichia coli 8 ataaaacgtt ttat 14 9 16DNA Escherichia coli misc_feature (7)..(10) n is a, c, g or t 9attgatnnnn atcaat 16 10 14 DNA Escherichia coli misc_feature (5)..(10) nis a, c, g or t 10 atygnnnnnn crat 14 11 27 DNA Escherichia colimisc_feature (6)..(23) n is a, c, g or t 11 cccccnnnnn nnnnnnnnnnnnncccc 27 12 14 DNA Escherichia coli 12 ccgaaacgtt tcgg 14 13 21 DNAEscherichia coli 13 cgcartattc aygytgrtga t 21 14 17 DNA Escherichiacoli misc_feature (6)..(12) n is a, c, g or t 14 ctggcnnnnn nnttgca 1715 18 DNA Escherichia coli 15 ctktcatawa wctgtcay 18 16 19 DNAEscherichia coli 16 gaaaataatt cttatttcg 19 17 13 DNA Escherichia colimisc_feature (6)..(6) n is a, c, g or t 17 gatctnttnt ttt 13 18 14 DNAEscherichia coli misc_feature (5)..(11) n is a, c, g or t 18 gcacnnnnnnncaa 14 19 16 DNA Escherichia coli misc_feature (7)..(7) n is a, c, g ort 19 gtgtaancgn ttacac 16 20 29 DNA Escherichia coli misc_feature(7)..(24) n is a, c, g or t 20 gttaagnnnn nnnnnnnnnn nnnncgtcc 29 21 20DNA Escherichia coli 21 tactgtatat atatacagta 20 22 17 DNA Escherichiacoli misc_feature (9)..(9) n is a, c, g or t 22 tagtaaaant tttacta 17 2317 DNA Escherichia coli misc_feature (9)..(9) n is a, c, g or t 23tatcaccgng cgtgata 17 24 18 DNA Escherichia coli misc_feature (6)..(13)n is a, c, g or t 24 tatccnnnnn nnnggata 18 25 13 DNA Escherichia colimisc_feature (5)..(9) n is a, c, g or t 25 tgaannnnnt tca 13 26 12 DNAEscherichia coli 26 tgaaacgttt ca 12 27 12 DNA Escherichia colimisc_feature (7)..(8) n is a, c, g or t 27 tgaaacgttt ca 12 28 17 DNAEscherichia coli misc_feature (9)..(9) n is a, c, g or t 28 tgcaccwwnwwggtgca 17 29 18 DNA Escherichia coli misc_feature (7)..(12) n is a, c,g or t 29 tgtaaannnn nntttaca 18 30 20 DNA Escherichia coli misc_feature(6)..(6) n is a, c, g or t 30 tgttangyya trrcntaaca 20 31 17 DNAEscherichia coli misc_feature (2)..(2) n is a, c, g or t 31 tntggacnnnnnngcta 17 32 28 DNA Escherichia coli misc_feature (7)..(22) n is a, c,g or t 32 ttgacannnn nnnnnnnnnn nntataat 28 33 29 DNA Escherichia colimisc_feature (7)..(23) n is a, c, g or t 33 ttgacannnn nnnnnnnnnnnnntataat 29 34 30 DNA Escherichia coli misc_feature (7)..(24) n is a,c, g or t 34 ttgacannnn nnnnnnnnnn nnnntataat 30 35 14 DNA Escherichiacoli misc_feature (7)..(8) n is a, c, g or t 35 ttgawcnngw tcat 14 36 14DNA Escherichia coli misc_feature (5)..(10) n is a, c, g or t 36ttgcnnnnnn gcaa 14 37 14 DNA Escherichia coli misc_feature (5)..(10) nis a, c, g or t 37 ttgcnnnnnn ttgc 14 38 16 DNA Escherichia coli 38ttgtgagcgc tcacaa 16 39 17 DNA Escherichia coli misc_feature (9)..(9) nis a, c, g or t 39 ttgtgagcng ctcacaa 17 40 18 DNA Escherichia coli 40ttgttagaat tctaacaa 18 41 13 DNA Escherichia coli misc_feature (7)..(7)n is a, c, g or t 41 tttagcngct aaa 13 42 13 DNA Escherichia colimisc_feature (7)..(10) n is a, c, g or t 42 watcaannnn ttr 13 43 20 DNAEscherichia coli 43 watgttcgwt awcgaacatw 20

What is claimed is:
 1. A synthetic array of surface-bound, bimolecular, double-stranded nucleic acid molecules, said array comprising a solid support, and a plurality of bimolecular double-stranded nucleic acid molecule members, a said member comprising a first nucleic acid strand linked to said solid support and a second nucleic acid strand which is substantially complementary to said first strand and complexed to said first strand by Watson-Crick base pairing, wherein for at least a portion of said members, each said member comprises a recognition site within a nucleic acid sequence for a protein, wherein a recognition site within a nucleic acid sequence for a protein of a first member is different from a recognition site within a nucleic acid sequence for a protein of a second member and wherein a said protein is bound to a said member thereof.
 2. The array of claim 1, wherein the 3′ end of said first strand is linked to said support.
 3. The array of claim 1, wherein the 5′ end of said first strand and the 3′ end of said second strand are not linked via a covalent bond.
 4. The array of claim 1, wherein the 5′ end of said second strand is not linked to said support.
 5. The array of claim 1, wherein said recognition site within a nucleic acid sequence for a protein is selected from the group that includes naturally-occurring recognition sites within a nucleic acid sequence for a protein or proteins, synthetic variants of naturally-occurring recognition sites within a nucleic acid sequence for a protein or proteins and randomized nucleic acid sequences.
 6. The array of claim 5, wherein said recognition site within a nucleic acid sequence for a protein comprises two half-sites, wherein either is recognized by a different protein than is the other.
 7. The array of claim 1, wherein said protein which is bound to a said member thereof comprises a detectable label.
 8. The array of claim 1, wherein said protein is a chimeric protein.
 9. The array of claim 8, wherein said chimeric protein comprises a DNA-binding domain fused in-frame with a protein:protein dimerization domain.
 10. The array of claim 8, wherein said chimeric protein comprises a DNA-binding domain fused in-frame to Green Fluorescent Protein.
 11. The array of claim 1, wherein said solid support is a silica support.
 12. The array of claim 1, wherein said first strand is produced by chemical synthesis and said second strand is produced by enzymatic synthesis.
 13. The array of claim 12, wherein said first strand is used as the template on which said second strand is enzymatically produced.
 14. The array of claim 13, wherein said first strand of each said member contains at its 3′ end a binding site for an oligonucleotide primer which is used to prime enzymatic synthesis of said second strand, and at its 5′ end a variable sequence.
 15. The array of claim 12, wherein said enzymatic synthesis is performed using an enzyme.
 16. The array of claim 14, wherein said oligonucleotide primer is between 10 and 30 nucleotides in length.
 17. The array of claim 1, wherein said first strand comprises DNA.
 18. The array of claim 1, wherein said second strand comprises DNA.
 19. The array of claim 1, wherein said first and second strands each comprise from 16 to 60 monomers selected from the group that includes ribonucleotides and deoxyribonucleotides.
 20. The array of claim 1, wherein said solid support is a silica support and said first and second strands (X) each comprise from 16 to 60 monomers selected from the group that includes ribonucleotides and deoxyribonucleotides.
 21. The array of claim 1, wherein at least a portion of said plurality have a second nucleic acid strand that is substantially complementary to- and base-paired with said first strand along the entire length of said first strand.
 22. A method for the construction of a synthetic array of surface-bound, bimolecular, double-stranded nucleic acid molecules, comprising the steps of (a) providing an array of first nucleic acid strands linked to a solid support, (b) hybridizing to said first strands of step (a) an oligonucleotide primer that is substantially complementary to a sequence comprised by a said first strand, (c) performing enzymatic synthesis of a second nucleic acid strand that is complementary to a said first strand of step (a) so as to permit Watson-Crick base pairing and so as to form an array comprising a plurality of bimolecular, double-stranded nucleic acid molecule members, wherein for at least a portion of said members, each said member comprises a recognition site within a nucleic acid sequence for a protein and wherein a recognition site within a nucleic acid sequence for a protein of a first member is different from a recognition site within a nucleic acid sequence for a protein of a second member, and (d) incubating said array with a protein sample comprising a protein under conditions that permit specific binding of said protein to a said member of said array, such that a said protein becomes bound to a said recognition site within a nucleic acid sequence for a protein on a said member to form a nucleic acid protein array.
 23. The method according to claim 22, wherein the 3′ end of said first strand is linked to said support.
 24. The method according to claim 22, wherein the 5′ end of said first strand and the 3′ end of said second strand are not linked via a covalent bond.
 25. The method according to claim 22, wherein the 5′ end of said second strand is not linked to said solid support.
 26. The method according to claim 22, wherein said recognition site within a nucleic acid sequence for a protein is selected from the group that includes naturally-occurring recognition sites within a nucleic acid sequence for a protein or proteins, synthetic variants of naturally-occurring recognition sites within a nucleic acid sequence for a protein or proteins and randomized nucleic acid sequences.
 27. The method according to claim 26, wherein said recognition site within a nucleic acid sequence for a protein comprises two half-sites, wherein either is recognized by a different protein than is the other.
 28. The method according to claim 22, wherein said protein which is bound to a said member of said array comprises a detectable label.
 29. The method according to claim 22, wherein said protein is a chimeric protein.
 30. The method according to claim 29, wherein said chimeric protein comprises a DNA-binding domain fused in-frame with a protein:protein dimerization domain.
 31. The method according to claim 29, wherein said chimeric protein comprises a DNA-binding domain fused in-frame to Green Fluorescent Protein.
 32. The method according to claim 22, wherein said solid support is a silica support.
 33. The method according to claim 22, wherein said first strand of each said member contains at its 3′ end a binding site for an oligonucleotide primer which is used to prime enzymatic synthesis of said second, and at its 5′ end a variable sequence, wherein said binding site is present in each said member of said array.
 34. The method according to claim 33, wherein said enzymatic synthesis is performed using an enzyme.
 35. The method according to claim 22, wherein said oligonucleotide primer of step (b) is between 10 and 30 nucleotides in length.
 36. The method according to claim 22, wherein said first strand of step (a) comprises DNA.
 37. The method according to claim 22, wherein said second strand of step (c) comprises DNA.
 38. The method according to claim 22, wherein said first and second strands each comprise from 16 to 60 monomers selected from the group that includes ribonucleotides and deoxyribonucleotides.
 39. The method according to claim 22, wherein said solid support is a silica support and said first and second strands each comprise from 16 to 60 monomers selected from the group that includes ribonucleotides and deoxyribonucleotides.
 40. The method according to claim 28, wherein said protein sample comprises a candidate inhibitor of binding of said protein to a said recognition site within a nucleic acid sequence for a protein on a said member of said array.
 41. The method according to claim 28, wherein said protein sample comprises a candidate inhibitor of binding of said protein to a second protein.
 42. A method of determining a consensus nucleic acid sequence for a recognition site within a nucleic acid sequence for a protein comprising the steps of a) providing a nucleic acid protein array comprising a solid support and a plurality of bimolecular double-stranded nucleic acid molecule members, a said member comprising a first nucleic acid strand linked to said solid support and a second nucleic acid strand which is substantially complementary to said first strand and complexed to said first strand by Watson-Crick base pairing, wherein for at least a portion of said members, each said member comprises a recognition site within a nucleic acid sequence for a protein, wherein a recognition site within a nucleic acid sequence for a protein of a first member is different from a recognition site within a nucleic acid sequence for a protein of a second member and wherein a said protein comprising a detectable label is bound to a said member thereof, and b) performing a detection step to detect the presence of said label on a feature of said array, wherein nucleotides that are shared among said recognition sites within a nucleic acid sequence for a protein present on said features on which said label is detected form a consensus nucleic acid sequence for a recognition site within a nucleic acid sequence for a protein specific for said protein.
 43. A method of identifying for a first protein which binds a nucleic acid as half of a protein:protein heterodimer complex one or a plurality of candidate second proteins with which it might dimerize and bind a nucleic acid molecule in vivo, comprising the steps of a) providing a nucleic acid array comprising a solid support, and a plurality of bimolecular double-stranded nucleic acid molecule members, a said member comprising a first nucleic acid strand linked to said solid support and a second nucleic acid strand which is substantially complementary to said first strand and complexed to said first strand by Watson-Crick base pairing, wherein for at least a portion of said members, each said member comprises a recognition site within a nucleic acid sequence for a protein, wherein a recognition site within a nucleic acid sequence for a protein of a first member is different from a recognition site within a nucleic acid sequence for a protein of a second member, wherein a said recognition site within a nucleic acid sequence for a protein comprises two half-sites and wherein either of said half-sites of a said recognition site within a nucleic acid sequence for a protein is recognized by a different protein than is the other, b) incubating said array with a protein sample comprising a first protein which recognizes a first half-site of a said recognition site within a nucleic acid sequence for a protein and one or a plurality of candidate second proteins under conditions which permit heterodimerization of a said first and candidate second protein and binding of a protein:protein heterodimer to a said recognition site within a nucleic acid sequence for a protein, c) recovering a said protein:protein heterodimer complex from a said member of said array under conditions whereby said first protein and said candidate second protein dissociate from one another, and d) identifying said candidate second protein, wherein each said candidate second protein so identified represents a protein with which said first protein may interact in vivo.
 44. The method of claim 43, wherein said identifying in step d) of said candidate second protein comprises sequencing thereof.
 45. The method of claim 43, wherein said identifying in step d) of said candidate second protein comprises binding of said candidate second protein to an antibody which is specific therefor.
 46. The method according to claim 43, wherein said first protein comprises a detectable label.
 47. The method according to claim 47, further comprising the step of performing a detection step to detect the presence of said label on a feature of said array, wherein the recognition site within a nucleic acid sequence for a protein present on a feature upon which said label is detected represents a candidate recognition site within a nucleic acid sequence for a protein which said heterodimer may bind in vivo.
 48. A method of identifying candidate members of a set of co-regulated genes, comprising the steps of a) providing a nucleic acid protein array comprising a solid support and a plurality of bimolecular double-stranded nucleic acid molecule members, a said member comprising a first nucleic acid strand linked to said solid support and a second nucleic acid strand which is substantially complementary to said first strand and complexed to said first strand by Watson-Crick base pairing, wherein for at least a portion of said members, each said member comprises a recognition site within a nucleic acid sequence for a protein, wherein a recognition site within a nucleic acid sequence for a protein of a first member is different from a recognition site within a nucleic acid sequence for a protein of a second member and wherein a said protein comprising a detectable label is bound to a said member thereof, and b) performing a detection step to detect the presence of said label on a feature of said array, wherein a gene having among its regulatory sequences one or more of said recognition sites within a nucleic acid sequence for a protein present on a said feature on which said label is detected is characterized as a candidate member of a set of co-regulated genes genes that are regulated by said protein.
 49. A method of assaying a candidate inhibitor of protein/nucleic acid interactions, comprising the steps of a) providing a nucleic acid array comprising a solid support and a plurality of bimolecular double-stranded nucleic acid molecule members, a said member comprising a first nucleic acid strand linked to said solid support and a second nucleic acid strand which is substantially complementary to said first strand and complexed to said first strand by Watson-Crick base pairing, wherein for at least a portion of said members, each said member comprises a recognition site within a nucleic acid sequence for a protein, wherein a recognition site within a nucleic acid sequence for a protein of a first member is different from a recognition site within a nucleic acid sequence for a protein of a second member, b) incubating said array with a protein sample comprising a protein comprising a detectable label and a candidate inhibitor of binding of said protein to a recognition site within a nucleic acid sequence for a protein on a said member of said array, under conditions which normally permit binding of said protein to said member, and c) performing a detection step to detect the presence of said label on said member, wherein the presence of said label on said member corresponds with binding of said protein to said member and wherein the negation of- or reduction in binding of said protein to said member is indicative of efficacy of said candidate inhibitor of protein:nucleic acid interactions in inhibiting binding of said protein to said recognition site within a nucleic acid sequence for a protein.
 50. A method of assaying a candidate inhibitor of a protein/protein interaction, comprising the steps of a) providing a nucleic acid array comprising a solid support and a plurality of bimolecular double-stranded nucleic acid molecule members, a said member comprising a first nucleic acid strand linked to said solid support and a second nucleic acid strand which is substantially complementary to said first strand and complexed to said first strand by Watson-Crick base pairing, wherein for at least a portion of said members, each said member comprises a recognition site within a nucleic acid sequence for a protein, wherein a recognition site within a nucleic acid sequence for a protein of a first member is different from a recognition site within a nucleic acid sequence for a protein of a second member, b) incubating said array with a protein sample comprising a first comprising a detectable label, wherein binding of said first protein to a recognition site within a nucleic acid sequence for a protein on a said member of said array is dependent upon an interaction between said first protein and a second protein and wherein said protein sample further comprises said second protein and a candidate inhibitor of said interaction, under conditions which normally permit said interaction, and c) performing a detection step to detect the presence of said label on a said member of said array, wherein the presence of said label on a said member corresponds with binding of said nucleic-acid-binding protein to said member and wherein the negation of- or reduction in binding of said protein to said member is indicative of efficacy of said candidate inhibitor in inhibiting said interaction between said first protein and said second protein. 